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ABSTRACT 

This  paper  addresses  a  number  of  psychological  issues  pertaining  to  display  design.  We  review 
the  literature  comparing  3-D  and  2-D  displays  and  evaluate  the  findings  in  terms  of  sub-surface 
environments.  In  addition  to  the  specific  problem  of  display  dimensionality  this  paper  outlines  a 
number  of  perceptual,  cognitive  and  ecological  factors  that  are  relevant  to  display  design  for 
submarine  environments.  The  Generative  Transformational  approach  to  visual  perception  is 
outlined  and  the  relevance  of  transformational  theory  to  display  design  is  discussed.  The  paper 
also  discusses  a  number  of  practical  and  theoretical  factors  relevant  to  empirical  assessment  of 
display  utility  and  outlines  three  key  areas  for  future  research  -  representing  uncertainty,  using  a 
cognitive  model  of  human  decision  making,  and  conveying  affective  information  -  that  have  the 
potential  to  uncover  novel  theoretical  developments  and  new  technologies. 
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Psychological  Implications  for 
Submarine  Display  Design 


Executive  Summary 

It  is  widely  anticipated  that  technological  advancements  in  display  design  should  lead 
to  corresponding  increases  in  situation  awareness  and  improved  task  performance 
among  display  users.  However,  it  should  not  be  assumed  that  there  is  an  immediate 
transfer  from  technological  advancement  to  overall  improvement.  While  designers  are 
now  capable  of  designing  realistic,  dynamic,  and  information  rich  displays,  the 
information  processing  abilities  of  display  users  have  remained  static.  If  display 
designers  fail  to  understand  the  abilities  of  the  display  users,  then  it  is  unlikely  that  the 
full  potential  of  the  new  technologies  can  be  realised.  Against  this  background,  the  aim 
of  this  paper  is  to  review  psychological  issues  pertaining  to  display  design. 

The  paper  addresses  perceptual  factors  such  as  colour  perception  and  luminosity 
contrast,  cognitive  factors  such  as  short-  and  long-term  memory  limitations  and 
decision  making  heuristics,  and  ecological  factors  such  as  task  requirements  and 
information  type  and  structure.  This  paper  also  reviews  the  literature  regarding  the  use 
of  two-  and  three-dimensional  displays,  and  makes  a  number  of  suggestions  in  relation 
to  display  dimensionality  in  sub-surface  environments.  Additionally,  the  Generative 
Transformational  Approach  to  visual  perception  is  briefly  outlined  and  its  relevance  of 
the  theory  to  display  design  is  discussed. 

The  paper  discusses  a  number  of  practical  and  theoretical  factors  relevant  to  empirical 
assessment  of  display  utility,  such  as  the  benefits  and  limitations  of  qualitative  and 
quantitative  research,  and  the  competing  pressures  of  internal  and  external  validity.  It 
is  suggested  that  researchers  adopt  a  strategy  whereby  initial,  simple  experiments 
based  on  well-replicated  principles  and  involving  semi-realistic  tasks  are  carried  out, 
the  findings  of  which  are  then  used  to  inform  the  design  of  subsequent,  more  complex 
and  realistic  experiments. 

The  need  for  further  research  in  three  key  areas  is  identified  from  the  literature  review. 
It  is  suggested  that  research  in  these  areas  has  the  potential  to  uncover  "game- 
changing"  theoretical  developments  and  new  technologies.  These  areas  are:  the 
representation  of  different  types  of  uncertainty,  decision  making  aids,  and  the  use  of 
affective  data. 

1  This  paper  was  researched  by  Matthew  Dry,  Michael  Lee  and  Douglas  Vickers  of  the  University  of 
Adelaide  under  contract  to  Maritime  Operations  Division,  DSTO.  It  has  been  edited  and  formatted 
by  Sam  Huf,  DSTO. 
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1.  Introduction 


Recent  technological  advances  have  seen  a  rise  in  computer  information  processing  speed  and 
increased  sophistication  in  data  visualisation.  It  is  widely  anticipated  that  incorporating  these 
technological  advancements  into  display  design  should  lead  to  corresponding  increases  in 
situation  awareness  and  improved  task  performance  amongst  display  users.  However,  there 
is  a  degree  of  risk  in  assuming  a  one-to-one  transfer  between  technological  advancement  and 
overall  improvement.  While  designers  are  now  capable  of  designing  realistic,  dynamic  and 
information  rich  displays,  the  information  processing  abilities  of  display  users  are  unchanged. 
Chalmers,  Easter  and  Potter  (2000)  have  suggested  that  faster  computers  may  actually  have  a 
detrimental  effect  on  task  performance,  as  they  can  create  a  higher  level  of  cognitive  demand 
and  information  overload.  In  order  to  avoid  this  scenario,  it  is  necessary  to  recognise  the  need 
for  psychological  insight  into  the  problem. 

Display  design  should  be  thought  of  in  terms  of  human-computer  interfacing.  If  the  display 
and  the  display  user  are  conceptualised  as  a  system  rather  than  as  distinct  entities,  it  becomes 
obvious  that  task  performance  is  contingent  upon  the  abilities  and  limitations  of  both  units.  If 
display  designers  fail  to  address  the  abilities  of  the  display  users,  then  it  is  unlikely  that  the 
full  potential  of  the  new  technologies  can  be  realised. 

In  light  of  these  points  this  paper  provides  a  review  of  the  perceptual,  cognitive  and  ecological 
factors  that  are  relevant  to  display  design  for  submarine  environments.  A  current  issue  in 
display  design  is  the  relative  utility  of  two-dimensional  and  three-dimensional  visualisations. 
The  majority  of  past  research  has  focused  on  aeronautical  and  terrestrial  applications.  This 
paper  reviews  the  literature  and  makes  a  number  of  suggestions  regarding  the  use  of  two-  and 
three-dimensional  displays  in  sub-surface  environments.  Elsewhere,  Vickers  has  outlined  the 
Generative  Transformational  Approach  to  Visual  Perception  (Vickers,  2002).  This  paper 
briefly  discusses  the  implications  of  Generative  Transformational  theory  for  display  design. 
Finally,  the  paper  also  discusses  a  number  of  practical  and  theoretical  factors  relevant  to 
empirical  assessment  of  display  utility,  and  outlines  a  number  of  future  research  directions. 
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2.  Human  Psychology  and  Display  Utility 

2.1  Perceptual  Factors  Influencing  Display  Utility 

Perception  can  be  thought  of  as  a  collective  term  for  those  processes  that  organise  sensory 
input  into  meaningful  patterns.  It  is  important  to  note  that  there  is  not  always  a 
straightforward  relationship  between  the  physical  nature  of  sensory  input  and  the  way  that 
this  input  is  perceived.  For  example,  most  people  should  be  familiar  with  visual  illusions 
similar  to  Figure  1,  in  which  two  lines  appear  to  be  of  qualitatively  different  lengths  despite 
being  physically  equal.  Given  that  the  human  visual  system  is  susceptible  to  illusory 
phenomena  such  as  these,  visual  display  design  should  be  informed  by  knowledge  of  the 
capabilities  of  human  perception.  This  section  outlines  a  number  of  perceptual  factors  that 
may  affect  display  utility. 


Figure  1:  The  Mueller-Lyer  illusion,  in  which  the  upper  horizontal  line  appears  to  be  longer  than  the 
lower  horizontal  line,  despite  the  fact  that  both  lines  are  the  same  length 


2.1.1  Achromatic  effects 

A  number  of  researchers  have  demonstrated  that  the  brightness  of  objects  or  areas  is 
dependent  upon  the  luminance  of  adjacent  areas  (Kingdom,  1997).  For  example,  it  has  been 
demonstrated  that  two  identical  grey  areas  appear  to  be  different  in  terms  of  brightness  if  one 
of  the  areas  is  presented  on  a  black  background  and  the  other  on  a  white  background  (Figure 
2)  (White,  1979).  Figure  3  illustrates  that  this  effect  can  also  occur  in  more  naturalistic  settings. 
The  light  grey  checked  area  that  lies  in  shadow  is  in  fact  the  same  brightness  as  the  dark  grey 
checked  area  that  is  not  in  shadow.  This  has  obvious  implications  for  the  use  of  grey  scales  to 
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convey  information.  For  example,  if  grey-scaling  is  used  to  indicate  the  threat-status  of  target 
platforms,  then  a  neutral  or  unspecified  platform  may  be  wrongly  perceived  as  either  a  friend 
or  enemy,  depending  upon  the  location  of  the  platform  on  the  display  and  the  relative 
brightness  of  its  background. 


Figure  2.  Although  the  grey  area  on  the  white  background  appears  to  be  darker  than  the  grey  area  on 
the  dark  background,  the  two  areas  are  actually  of  equal  brightness 


Figure  3.  Despite  appearances ,  the  light  grey  check  that  lies  in  shadow  (a)  is  actually  of  the  same 
brightness  as  the  dark  checks  that  are  not  in  shadow  (b) 


An  alternative  to  using  grey-scales  for  portraying  information,  sometimes  used  in  graphical 
displays,  is  to  use  different  patterns  or  crosshatching.  However,  display  designers  should  be 
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aware  of  the  possibility  of  creating  Moire  effects:  areas  of  water-  or  silk-like  shimmering  that 
result  from  closely  spaced  lines  (Figure  4)  (Laisk,  1971).  Moire  effects  can  be  distracting  and 
may  potentially  cause  fatigue.  Display  designers  should  also  be  aware  of  other  illusory  effects 
that  can  occur  when  lines  or  line  segments  are  overlaid.  For  example,  straight  lines  can  appear 
to  be  curved  when  placed  against  a  background  of  curved  or  radiating  lines  (Sanders  & 
McCormick,  1993).  This  might  be  problematic  in  3-D  displays  as  it  could  distort  depth 
perception. 


Figure  4.  Area  of  shimmering  caused  by  a  Moire  effect 

2.1.2  Colour  perception 

The  use  of  colour  within  displays  raises  a  number  of  potential  perceptual  problems.  For 
example,  a  well-established  body  of  data  exists  suggesting  that  red  and  blue  should  not  be 
used  next  to  each  other  within  displays  as  they  can  cause  a  number  of  distracting  effects 
(Kosslyn,  1994).  One  example  of  this  is  known  as  chromatic  aberration:  because  of  the 
difference  in  wavelength  between  red  light  and  blue  light,  the  eye  is  unable  to  focus  on  both 
colours  at  the  same  time.  As  a  consequence  of  this,  the  colours  will  appear  to  shimmer  as  they 
move  in  and  out  of  focus  (Matthews  &  Mertins,  1989).  In  order  to  avoid  chromatic  aberration 
it  is  simply  suggested  that  red  and  blue  are  not  used  in  close  proximity1. 

Another  perceptual  phenomenon  associated  with  the  use  of  red  and  blue  within  displays  is 
chromostereopsis.  In  this  case,  when  red  and  blue  are  simultaneously  presented  on  a  dark 
background  the  two  colours  will  be  perceived  as  being  separated  within  the  depth  plane.  Red 
will  be  seen  as  being  closer  to  the  viewer  and  blue  as  being  further  away  (Matthews  & 
Mertins,  1989).  Similarly,  Kosslyn  indicates  that  warm  coloured  objects  that  are  placed  behind 
cool  coloured  objects  will  struggle  to  move  into  the  foreground  (Kosslyn,  1994).  Both  of  these 


1  This  is  interesting  because  red  and  blue  appear  together  on  many  military  displays  representing 
hostile  and  friendly  tracks. 
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effects  are  potentially  distracting,  and  there  is  evidence  to  suggest  that  they  may  cause  visual 
fatigue  (Murch,  1984  cited  in  Helander,  1987).  To  avoid  chromostereopsis  it  has  been 
suggested  that  "colours  should  not  be  used  in  situations  where  less  than  3.39'  of  disparity  can 
be  recognised"  (McLain,  Cacioppa,  Reising,  &  Koubek,  1990). 

There  are  a  number  of  perceptual  phenomena  that  should  be  considered  when  colour  is  to  be 
used  to  convey  quantitative  or  qualitative  information.  Individual  colours  are  perceived  as 
being  different  because  they  are  comprised  of  a  mixture  of  various  light  wavelengths.  A 
number  of  factors  may  alter  this  mixture,  thereby  distorting  the  user's  colour  perception.  For 
example,  the  ambient  lighting  of  the  environment  in  which  a  display  is  being  viewed  may 
alter  the  colour-mix:  if  the  colour  of  the  light  in  the  display  room  changes  from,  say,  white  to 
red,  then  this  will  have  an  effect  upon  the  colours  on  the  display  screen  (Foley  &  Moray,  1987; 
Helander,  1987). 

It  should  also  be  noted  that  colour  perception  in  humans  is  almost  entirely  absent  at  low-light 
levels.  The  discrimination  of  the  temporal  frequency  of  visible  light  (yielding  colour)  is 
handled  by  what  is  termed  the  photopic  system  (utilising  the  response  of  cone  cells).  This 
system  is  not  responsive  in  low-level  light.  At  those  lower  light  levels  a  scotopic  system 
(primarily  utilising  rod  sensitivity)  becomes  more  useful.  Scotopic  vision  (night  vision)  is 
virtually  monochromatic  (Coren,  Ward  &  Enns,  1994).  Hence,  display  components  depending 
on  colour  discrimination  (colour  saturation)  probably  should  be  avoided  for  operations  at 
very  low-light  levels. 

Other  effects  that  may  cause  colour  distortion  are  simultaneous  colour  contrast  and  chromatic 
adaptation  (Palmer,  1999).  Simultaneous  colour  contrast  occurs  when  the  perception  of  a 
colour  is  influenced  by  the  colours  that  surround  it.  Depending  on  the  saturation  of  the  colour 
and  the  time  it  is  viewed  for,  the  colour  of  an  object  will  shift  in  hue  toward  the 
complementary  hue  of  the  background  colour.  Chromatic  adaptation  occurs  after  prolonged 
exposure  to  a  specific  colour.  This  will  cause  a  reduction  in  sensitivity  to  that  colour  in  the 
period  immediately  following  (Palmer,  1999).  For  example,  if  the  user  is  exposed  to  a  blue 
'loading'  screen  for  a  prolonged  period  they  may  subsequently  be  less  sensitive  to  objects  and 
symbols  of  that  hue  within  the  display. 

Colour  afterimages  (Palmer,  1999)  occur  after  viewing  highly  saturated  colours  for  a 
prolonged  period  of  time.  Afterimages  are  the  complementary  hue  of  the  original  colour.  For 
example,  green  objects  have  a  red  afterimage  and  black  objects  have  a  white  afterimage. 
Individuals  commonly  experience  monochromatic  afterimages  after  staring  at  bright  lights. 
The  phenomenon  may  be  a  potential  distraction  similar  to  chromostereopsis  or  chromatic 
aberration.  Designers  should  be  mindful  of  this  aspect  of  colour  perception. 


2.1.3  Blurring 

Korge  and  Kreuger  (1984,  cited  in  Grandjean,  1987)  investigated  the  effects  of  object  blurring 
or  sharpness  upon  accommodation,  finding  that  observing  blurred  characters  led  to  a  shift  in 
accommodation  to  the  resting  position.  Bisantz,  Kesevadas,  Scott,  Lee,  Basapur,  Bhide, 
Sharma  &  Roth  (2002)  have  suggested  the  use  of  blurring  to  indicate  uncertainty  in  track 
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position,  heading  and  type  within  visual  displays  (Bisantz,  Kesevadas,  Scott,  Lee,  Basapur, 
Bhide,  Sharma,  &  Roth,  2002)  (Figure  5).  To  date  there  has  been  no  empirical  investigation  of 
the  usefulness  of  this  technique,  but  given  that  there  is  a  small  amount  of  evidence  to  suggest 
that  characters  or  objects  that  are  blurred  can  cause  fatigue  and  discomfort  (Grandjean,  1987), 
sharpness  should  probably  not  be  used  as  a  coding  device. 


Figure  5.  The  use  of  sharpness  or  blurring  to  convey  uncertainty  (Bisantz  et  al.,  2002) 


2.1.4  Perceptual  Abnormalities 

Colour-blindness  affects  an  individual's  ability  to  perceive  or  distinguish  between  colours.  It 
has  been  estimated  that  colour-blindness  occurs  in  8%  of  the  male  population,  and  1%  of 
females  (Kosslyn,  1994;  Palmer,  1999),  however  there  are  a  number  of  different  forms  of 
colour-blindness,  and  the  severity  of  the  effects  can  vary.  The  cause  of  colour  perception 
abnormalities  can  be  situated  at  either  a  retinal  or  cortical  level.  At  a  retinal  level,  normal 
human  colour  perception  relies  upon  three  types  of  photoreceptors:  L-,  M-  and  S-cones.  L- 
cones  are  sensitive  to  long  wavelength  light  frequencies,  M-cones  are  middle  wavelength 
sensitive,  and  S-cones  are  short  wavelength  sensitive.  Colour-blind  individuals  are  typically 
missing  one  or  two  of  these  types  of  cells,  and  the  type  and  extent  of  an  individual's 
perceptual  deficit  is  contingent  upon  the  type  of  cone  cells  that  they  are  missing  (Stockman  & 
Sharp,  1998). 

Individuals  with  the  most  common  forms  of  retinal  colour-blindness  are  missing  (or 
malfunctioning)  either  L-  or  M-cones  and  therefore  perceive  red  and  green  hues  as  shades  of 
grey  (Palmer,  1999).  As  a  result  of  this  they  are  unable  to  discriminate  between  red  or  green 
hues.  Additionally,  this  deficit  affects  the  discrimination  of  colours  that  differ  in  luminance 
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according  to  the  degree  of  green  or  red  in  its  make-up  (Helander,  1987),  and  also  affects  an 
individual's  ability  to  distinguish  between  these  two  colours  and  greys  of  equal  luminance. 
Less  common  forms  of  retinal  colour-blindness  include  tritanopia  and  monochromatopia. 
Tritanopia  affects  the  ability  to  discriminate  yellows  and  blues.  Tritanopic  individuals 
experience  these  colours  as  shades  of  grey,  and  are  therefore  are  subject  to  similar  problems  as 
red/ green  colour-blind  individuals.  Monochromats  have  only  one  type  of  colour 
photoreceptor  cell  (Stockman  &  Sharp,  1998)  and  therefore  are  only  able  to  distinguish 
between  variations  in  lightness,  not  between  different  colours. 

Colour  perception  impairment  can  also  result  from  cortical  damage.  Individuals  with 
achromatopsia  are  unable  to  perceive  any  colour  at  all  and  see  the  world  in  black  and  white. 
This  disorder  can  affect  either  all  or  part  of  their  visual  field.  Other  disorders  arising  from 
cortical  damage  can  affect  an  individual's  ability  to  recognize  or  name  different  colours  (Kolb 
&  Wishaw,  1999;  Palmer,  1999).  However,  unlike  retinal  colour-blindness,  which  is  a 
congenital  disorder,  cortical  colour-blindness  is  usually  the  result  of  some  form  of  brain 
injury. 

The  non-negligible  proportion  of  colour-blind  individuals  in  the  population  makes  it 
necessary  to  either  screen  users  for  the  condition,  or  design  displays  so  that  they  do  not 
require  colour  differentiation.  However,  any  decision  should  bear  in  mind  the  relative  ease 
with  which  colour-blindness  can  be  detected,  and  the  potential  benefits  and  limitations  of 
using  colour  to  encode  information  (as  outlined  in  sections  2.1.2  and  2.2.6). 

A  second  form  of  perceptual  abnormality  that  might  potentially  impact  upon  display  utility  is 
stereoblindness.  Stereoblindness  or  stereoanomaly  is  the  inability  to  perceive  differences  in 
depth  when  the  viewer  is  presented  with  stimuli  that  vary  in  degree  of  stereoscopic  disparity. 
It  is  estimated  that  30%  of  the  population  exhibit  errors  in  stereoscopic  depth  judgments,  and 
around  3  to  10%  of  the  population  are  stereoblind  (Palmer,  1999;  van  Ee  &  Richards,  2002). 
This  has  obvious  implications  for  the  use  of  stereoscopically  presented  3-D  displays  and  is 
potentially  a  greater  problem  than  colour-blindness  for  two  reasons.  First,  it  appears  to  affect  a 
larger  proportion  of  the  population.  Secondly,  although  visual  displays  can  be  designed  so 
that  users  do  not  need  to  differentiate  between  colours  it  is  difficult  to  envisage  a  way  of 
making  stereoscopic  displays  accessible  to  stereoanomalous  individuals. 

2.1.5  Perceptual  Factors:  Summary 

This  section  outlined  a  number  of  perceptual  factors  that  may  influence  display  utility.  These 
include: 

•  Achromatic  effects ,  such  as  contrast  luminosity  and  moire  effects. 

•  Chromatic  (colour  perception)  effects  that  may  be  distracting  such  as  chromatic  aberration  and 
chromostereopsis ,  and  factors  that  may  influence  the  ability  to  discriminate  between  two 
different  colours  such  as  simultaneous  colour  contrast ,  chromatic  adaptation ,  and  afterimages. 

•  The  possibility  of  distraction  due  to  blurring  or  image  sharpness. 
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2.2  Cognitive  Factors  Influencing  Display  Utility 

Cognition  is  a  broad  term  that  is  used  to  describe  processes  that  are  directly  related  to,  or 
involved  in,  thinking,  conceiving  and  reasoning.  Given  that  the  aim  of  a  visual  display  is  to 
convey  information,  it  is  necessary  to  take  human  cognition  into  account  when  designing 
visual  displays.  This  section  outlines  a  number  of  cognitive  factors  that  may  affect  display 
utility. 

2.2.1  Attention 

People  have  limited  attention  capabilities,  in  the  sense  that  they  can  only  pay  attention  to  a 
limited  amount  of  information  at  once  and  over  a  limited  time  (Endsley,  1995b).  Sanders  and 
McCormick  (1993)  identify  three  different  types  of  attention  that  are  relevant  to  display 
design,  and  outline  several  useful  guidelines  (Sanders  &  McCormick,  1993).  Selective  attention 
involves  monitoring  several  sources  of  information  to  perform  a  task,  and  divided  attention 
involves  performing  two  or  more  tasks  simultaneously.  If  display  users  are  performing  tasks 
involving  selective  or  divided  attention  it  is  suggested  that  the  different  information  sources 
be  combined  within  a  single  display,  rather  than  requiring  users  to  monitor  two  or  more 
separate  displays.  However,  integrating  displays  may  lead  to  higher  levels  of  display  clutter 
(Yeh  &  Wickens,  2001),  and  there  is  evidence  to  suggest  that  integrated  displays  may  be  more 
fatiguing,  that  is,  operators  will  maintain  maximum  vigilance  for  a  shorter  period  of  time 
(Sauer,  Wastell,  Hockey,  Crawshaw,  Ishak,  &  Downing,  2002).  Focused  attention  involves 
attending  to  one  or  more  channels  of  information  while  avoiding  being  distracted  by  other 
information.  It  is  suggested  that  displays  being  used  for  focused  attention  tasks  should 
display  only  information  that  is  related  specifically  to  the  task.  Alternatively,  if  this  is  not 
possible,  the  display  information  could  be  subjected  to  intensity  coding  whereby  salient 
information  is  made  brighter,  and  less  important  or  redundant  information  is  faded  (Yeh  & 
Wickens,  2001). 

2.2.2  Short-term  memory  limitations 

Williges,  Williges  and  Elkerton  (1987)  stress  the  need  to  minimize  the  amount  of  information 
that  is  held  in  working,  or  short-term  memory  (STM)  by  interface  users,  particularly  if 
multiple  tasks  are  being  performed  simultaneously  (Williges,  Williges,  &  Elkerton,  1987).  It  is 
widely  accepted  that  the  upper  limit  of  STM  is  between  5  to  9  items  (Miller,  1956),  varying 
according  to  data  complexity,  presentation  sequence,  the  length  of  time  the  information  must 
be  remembered  for,  and  the  amount  of  competing  information. 

Eddy,  Kribs  and  Co  wen  (1999)  list  a  number  of  common  errors  made  by  display  users  that  can 
be  associated  with  STM  limitations.  These  include  confusing  and  forgetting  track  numbers; 
confusing  track  kinematic  data,  such  as  approaching  versus  departing,  or  climbing  versus 
descending;  and  combining  past  track-related  events  or  actions  with  incorrect  tracks  and 
associating  completed  ownship  actions  with  incorrect  tracks.  Errors  such  as  these  might 
possibly  be  minimized  if  information  such  as  track  heading  and  type  are  explicitly 
represented  within  the  display,  thereby  reducing  the  amount  of  information  needing  to  be 
stored  in  working  memory. 
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Another  area  in  which  STM  limitations  may  impact  upon  display  utility  is  in  the  design  of 
selection  menus  (Williges  et  al.,  1987).  Wickens  (1984)  indicates  that  if  users  need  to  consider 
all  of  the  menu  options  simultaneously,  the  menu  will  be  more  effective  if  the  number  of  items 
is  less  than  or  equal  to  the  upper  limit  of  STM  (Wickens,  1984). 


2.2.3  Long-term  memory  limitations 

There  is  a  growing  body  of  research  suggesting  that  the  way  objects  are  represented  within  a 
display  may  affect  activities  such  as  visual  search,  naming  and  identification  (Smallman, 
Oonk,  &  St.  John,  2001a;  Smallman,  Oonk,  St.  John,  &  Cowen,  2001b;  Smallman  et  al.,  2000; 
Smallman,  Schiller,  &  Mitchell,  1999).  Task  demands  generally  require  that  information  such 
as  platform  type  (cruiser,  carrier,  submarine),  and  threat  affiliation  (friendly,  hostile,  neutral, 
unknown)  be  implicitly  represented  within  displays.  However,  long-term  memory  (LTM) 
limitations  place  restrictions  upon  the  number  of  different  symbols  that  can  be  used  to 
represent  these  data.  It  is  commonly  assumed  that  there  is  an  upper  limit  to  the  number  of 
different  symbols  a  user  can  remember  successfully,  and  that  this  is  moderated  by  factors  such 
as  practice,  familiarity,  and  stress-level  (Helander,  1987).  Helander  provides  an  outline  of  the 
maximum  and  recommended  number  of  featural  differences  that  should  be  used  for  a  range 
of  coding  dimensions,  such  as  colours,  geometric  shapes  and  icons  (Helander,  1987). 
According  to  Grether  and  Baker  (1972  cited  in  Helander,  1987),  in  optimal  conditions,  with  a 
high  level  of  training  and  familiarity,  and  allowing  a  5%  error  rate,  subjects  can  differentiate 
between  up  to  15  different  geometric  shapes2.  No  error  rate  vs.  number  function  is  provided. 
However,  to  ensure  a  high  level  of  accuracy  in  operational  conditions  five  different  shapes  is 
the  recommended  limit. 

LTM  limitations  also  appear  to  be  influenced  by  the  number  of  dimensions  used  to  encode 
information,  the  particular  combination  of  dimensions  that  are  used,  and  whether  the 
dimensions  are  orthogonal  or  redundant  (Sanders  &  McCormick,  1993).  Heglin  (1973  cited  in 
Sanders  and  McCormick,  1993)  suggests  that  no  more  than  two  dimensions  should  be 
combined  if  rapid  interpretation  is  required,  and  also  provides  a  list  of  dimensions  that  are 
not  recommended  to  be  used  together.  These  include  flash-rate  and  brightness,  shape  and 
letter  or  number,  and  colour  hue  and  contrast. 

Two  main  approaches  have  been  used  to  represent  objects  within  displays:  realistic  icons 
and  abstract  symbols.  The  use  of  realistic  icons  may  be  regarded  as  being  advantageous 
because  it  is  easy  to  differentiate  between  broad  classes  of  objects.  For  example,  the  helicopter 
icon  in  Figure  6  is  easily  differentiated  from  the  cruiser  icon.  However,  research  has  indicated 
that  users  have  difficulty  distinguishing  between  icons  within  categories  (Smallman  et  al.. 


2  A  reviewer  has  pointed  out  that  this  seems  a  very  limited  range  of  discrimination  given  that  humans 
commonly  deal  with  much  larger  geometric  shape  sets,  such  as  the  English  alphabet,  or  idiographic 
languages.  Language  acquisition  and  its  relation  to  the  written  word  is  a  complex  field  beyond  the 
scope  of  this  brief  report.  Once  again,  the  message  for  designers  is  that  human  memory  and  processes 
of  discrimination  have  capacity  limitations.  These  limitations  vary  dramatically,  between  experts  and 
novices  for  example.  Clearly,  the  simpler  the  task  of  discrimination  of  geographic  shape  in  representing 
dimensions  of  a  situation  the  better  for  rapid  abstract  information  transfer. 
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2001b).  For  example,  the  cruiser  and  carrier  icons  in  Figure  6  are  visually  similar  and  therefore 
potentially  easy  to  confuse.  In  contrast  to  this.  Figure  6  demonstrates  there  is  no  such 
restriction  upon  the  potential  differences  between  abstract  symbols.  Arbitrary  coding, 
however,  removes  any  potential  advantage  that  may  be  provided  by  semantic  associations 
between  object  type  and  dimensions  such  as  shape.  For  example,  boat  shapes  can  be  used  to 
represent  boats  and  not  aircraft. 

Smallman  and  colleagues  (Smallman  et  al.,  2001a;  Smallman  et  al.,  2001b)  have  suggested  an 
alternative  approach  for  representing  objects  within  displays.  They  have  developed  a  hybrid 
of  realistic  icons  and  abstract  symbols  that  they  call  Symbicons.  Some  examples  of  Symbicons 
are  displayed  in  Figure  6.  Symbicons  can  be  infinitely  differentiated  and  yet  retain  features 
such  as  semantic  shape  associations  that  are  believed  to  aid  LTM  information  encoding  and 
retrieval.  The  results  of  a  visual  search  experiment  indicated  that  Symbicons  provided  an 
advantage  over  both  realistic  icons  and  abstract  symbols  for  speed  and  accuracy  of 
categorization  (Smallman  et  al.,  2001b).  The  results  of  this  study,  in  fact,  were  not  statistically 
significant.  The  authors  suggest  that  this  may  be  because  the  number  of  different  platform 
types  used  in  the  study  was  limited.  However,  we  should  be  careful  not  to  draw  conclusions 
from  this  work  until  further  research  has  been  conducted  in  this  area.  Symbicons  appear  to 
lack  some  of  the  redundancy  in  coding  features  that  is  characteristic  of  typical  abstract 
symbols  (e.g.  a  naval  symbol  representing  enemy  vessel  uses  both  colour  and  shape). 
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Figure  6.  Examples  of  icons  (left),  symbols  (center)  and  symbicons  (right).  From  (Smallman  et  al , 
2001b). 
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2.2.4  Decision  making  heuristics  and  models 

Traditionally,  human  decision-making  has  been  thought  of  as  classically  rational.  In  other 
words,  it  is  assumed  that  when  an  individual  makes  a  decision,  they  first  search  the 
environment  for  all  salient  information,  attach  some  sort  of  weight  to  this  information,  and 
then  weigh  up  the  comparative  utility  of  making  each  of  the  choices  available  to  them.  It  is 
generally  recognised  that  there  are  problems  associated  with  the  classically  rational  approach. 
First,  it  is  extremely  time  consuming.  Second,  it  assumes  unlimited  memory  capacity.  Third, 
empirical  research  suggests  that  people  do  not  behave  like  this.  Research  indicates  that 
humans  rely  heavily  upon  heuristics  and  biases  when  making  decisions  (Arkes  &  Hammond, 
1986).  Rather  than  weighing  up  all  of  the  information  that  is  available  within  a  given 
environment,  individuals  appear  to  base  decisions  upon  a  limited  number  of  salient  cues 
Todd  &  Gigerenzer  (1999). 

The  amount  of  information  used  to  make  a  decision  appears  to  be  related  to  two  factors:  the 
cost  of  making  a  wrong  decision,  and  the  structure  of  information  within  the  environment. 
Simple,  low  cost  decisions  appear  to  require  less  information  than  complex,  high  cost 
decisions  (Lee  &  Cummins,  submitted).  However,  in  certain  environments  information  is 
structured  in  such  a  way  that  it  is  not  necessary  to  carry  out  exhaustive  searches  regardless  of 
the  importance  of  the  decision.  If  the  first  piece  of  information  is  important  enough,  it  may  not 
matter  what  the  weight  of  any  subsequent  cues  are. 

Research  into  threat  cues  indicates  that  certain  cues  appear  to  be  more  significant  than  others. 
For  example,  platform  type,  weapon  envelope  and  electronic  emissions  were  found  to  be  the 
three  most  salient  cues  for  threat  assessment  by  experienced  surface  warfare  navy  officers 
(Liebhaber  &  Feher,  2002).  Research  may  be  able  to  determine  the  most  important  cues 
employed  by  display  users  across  a  range  of  the  most  frequent  decision-making  tasks.  By 
explicitly  representing  only  the  most  salient  cues,  it  may  be  possible  to  reduce  search  time  and 
display  cluttering,  and  thereby  reduce  cognitive  load.  Additionally,  it  may  be  possible  to 
develop  simple  algorithms  capable  of  assessing  conditional  states  such  as  threat  status,  which 
could  then  be  used  to  alert  display  users  of  potential  threats.  Todd  and  Gigerenzer  (1999) 
provide  a  number  of  fast  and  frugal  heuristics  that  are  capable  of  closely  replicating  human 
decision-making  that  may  be  suitable  for  such  a  process. 

Another  influential  theory  emerging  in  recent  years  has  been  so  called.  Naturalistic  Decision- 
Making  based  on  Klein's  Recognition  Primed  Decision-making  model.  This  argument  holds 
that  decision-making  depends  on  the  schematic  representation  of  a  "situation"  as  a  mental 
model  within  a  framework  of  cognitive  schemata  (Lipshitz  &  Ben  Shaul,  1997).  Such  a 
decision  process  is  shown  Figure  7.  Here,  schemata  mediate  both  information  search  and 
interpretation,  helping  the  decision-maker  quickly  identify  meaning.  Consequently,  accurate 
mental  models  (i.e.  an  understanding  of  'reality'  that  accurately  reflects  important 
components  of  'reality')  are  likely  to  be  critical  components  of  successful  decision-making. 
Experts  are  able  to  recognise  patterns  and  contingencies  in  a  situation  and  decide  very 
quickly. 
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Figure  7.  Lipshitz  and  Ben  Shaul's  Schemata-driven  mental  modelling  component  of  Klein  s  RPD 
model 

2.2.5  Gestalt  grouping  principles 

According  to  Helander  (Helander,  1987),  display  designers  should  be  aware  of  Gestalt 
principles  such  as  proximity  and  similarity  when  grouping  display  information.  Viewers  tend 
to  perceive  regions  or  objects  of  the  same  shape  or  colour  as  distinct  groups  or  entities 
(Kosslyn,  1994).  Additionally,  objects  that  are  enclosed  within  a  region  by  a  linear  boundary 
are  perceived  as  belonging  together  (Palmer,  1999).  These  principles  can  be  used  to  inform 
display  design  so  that  information  can  be  grouped  in  an  immediately  interpretable  manner. 
For  example,  Wickens  and  Carswell  (1995)  suggest  that  if  "similar  information  processing  is 
required  for  distinct  display  elements,  then  increasing  their  spatial  proximity  or  surrounding 
them  with  boundaries  should  improve  performance"  (Wickens  &  Carswell,  1995),  and  Brand 
and  Orenstein  (1998)  have  found  empirical  evidence  supporting  this. 

Figure  8  displays  an  example  of  Gestalt  grouping  principles  being  applied  to  display  design. 
The  icons  of  functions  that  are  closely  related  are  situated  in  close  proximity  to  each  other  and 
are  enclosed  within  a  common  region.  For  example,  the  "print"  and  "print  preview"  icons  are 
positioned  next  to  each  other  within  a  region  that  is  separate  to  the  "new  document",  "open 
document"  and  "save  document"  icons.  Additionally,  it  may  be  useful  to  use  combinations  of 
these  principles  to  represent  hierarchies  of  function  similarities.  One  set  might  be  organized 
horizontally,  and  another  set  of  related  functions  organized  vertically,  with  sub-groupings 
enclosed  by  linear  boundaries. 
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Figure  8.  An  example  of  Gestalt  grouping  principles  being  applied  to  display  design.  From  the 
Microsoft  Word  toolbar. 

2.2.6  Colour  and  Cognition 

There  are  a  number  of  cognitive  factors  that  must  be  considered  if  colour  is  to  be  used  in 
visual  displays.  Numerous  sources  indicate  that  display  designers  should  take  into  account 
the  semantic  associations  of  colours  (Helander,  1987;  Kosslyn,  1994a).  For  example,  colours  are 
associated  with  certain  objects  or  types.  The  sea  is  generally  associated  with  the  colours  blue 
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or  green,  and  therefore  should  not  be  depicted  as  red  or  yellow.  Similarly,  land  should  be 
depicted  as  either  green  or  brown.  By  aligning  the  properties  of  the  display  with  users 
expectations,  search  time  and  confusion  can  be  minimized. 

Colours  are  also  associated  with  abstract  concepts.  For  example  red  is  often  associated  with 
danger,  yellow  with  caution,  and  green  with  safety.  A  number  of  different  colour  coding 
schemes  exist.  The  US  Department  of  Defence  military  standards  specify  a  colour  coding 
scheme  for  use  in  visual  displays  in  which  red  is  used  to  indicate  inoperative  systems,  flashing 
red  indicates  emergency  situations,  yellow  advises  caution,  green  indicates  a  fully  operational 
system,  and  white  represents  transient  conditions  (Helander,  1987).  Additionally,  there  is 
some  evidence  to  suggest  that  colours  may  be  associated  with  affective  states.  For  example 
blue  is  associated  with  tranquillity,  and  orange  with  excitation  (Grandjean,  1987;  Norman  & 
Scott,  1952).  These  associations  may  be  of  importance  if  colour  is  being  used  to  convey 
information  such  as  indicating  possible  areas  for  manoeuvring  in  order  to  maximize  passive 
sonar  readings.  If  red  is  used  to  indicate  the  best  or  optimal  area  because  it  is  a  sonar  'hotspot', 
it  may  cause  confusion  because  red  can  also  mean  danger. 

Another  possible  problem  involves  colour-scales  being  used  to  convey  semantic  information, 
(that  is  meaning  as  opposed  to  the  lower  level  perceptual  aspects  discussed  in  2.1.2).  A  number 
of  sources  stress  that  display  designers  should  avoid  using  colour  hue  to  indicate  quantitative 
information  because  hue  is  not  a  psychological  continuum  (Kosslyn,  1994a).  One  way  of 
conceptualising  the  relationships  between  different  colours  is  in  terms  of  physical  properties. 
In  this  way  they  can  be  thought  of  as  being  arranged  along  a  vertical  scale,  moving  from  red  at 
a  wavelength  of  674  nm.  through  orange  (651  to  600  nm.),  yellow  (584  nm.),  green  (504  nm.), 
and  blue  (472  nm.)  to  violet  at  434  nm.  However,  multidimensional  scaling  of  human  subjects' 
colour  similarity  ratings  has  revealed  that  the  mental  representation  of  colour  is  more  like  a 
wheel  or  circle  than  a  vertical  scale  (Shepard,  1962).  In  other  words,  red  and  violet  (the  ends  of 
the  visible  electromagnetic  spectrum )  are  not  at  opposite  ends  of  a  uniform  linear  semantic 
scale,  they  are  as  similar  to  each  other  as  red  is  to  orange,  and  violet  is  to  blue  (Figure  8). 


14 


DSTO-TR-1766 


scarlet  re^ 

0  orange 

^  purple 

0 violet 

)  brown 

^  bronze 

0  tan 

®blue 

0  yellow 

0  khaki 

0  turquoise 

0  olive 

P  green 

Figure  8.  Multi-Dimensional  Scaling  Representation  ofEkman's  (1954)  Colour  Similarity  Data. 


One  possible  solution  might  be  to  convey  quantitative  difference  using  a  limited  selection  of 
colours  from  the  colour  wheel,  for  example  yellow  through  green  to  blue.  Bisantz  et  al 
provide  an  example  colour-scale,  shown  in  Figure  9  (Bisantz  et  al.,  2002).  The  scale  in  Figure  9 
is  intended  to  represent  ten  levels  of  probability  ranging  from  5  to  95%  chance  of  the 
probability  of  the  presence  of  mines.  This  scale  seems  intuitive  viewed  as  a  set.  However, 
there  is  no  guarantee  that  when  viewed  individually,  the  colours  will  be  easily  without  the  aid 
of  a  reference  chart.  Colour  hue  discrimination,  appears  to  be  more  suitable  to  representing 
category  level  scaling  rather  than  interval  or  ratio  level  scales. 


Figure  9:  An  example  colour-scale.  From  (Bisantz  et  al.,  2002). 


2.2.7  Auditory,  tactile  and  olfactory  display  enhancement 

A  number  of  studies  have  focused  upon  the  use  of  auditory  enhancements  for  visual  displays. 
A  common  use  of  auditory  enhancement  has  been  as  warning  signals  (attention  grabbing). 
Early  research  has  indicated  that  subjects  attend  to  auditory  warning  signals  faster  than  visual 
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warnings  (Stirner,  Siegel,  &  Baker,  1957),  and  it  has  been  suggested  that  auditory  warning 
signals  are  particularly  useful  for  situations  in  which  the  visual  system  is  overburdened 
(Sorkin,  1987).  In  addition  to  general  warning  signals,  stereophonically  presented  auditory 
cues  have  been  used  as  an  aid  to  visual  search.  For  example,  a  number  of  researchers  have 
found  that  aurally  presented  cues  can  aid  localization  in  visual  search  but  may  not  be  useful 
in  representing  complex  information  (Flanagan,  McAnally,  Martin,  Meehan,  &  Oldfield,  1998; 
Perrott,  Cisneros,  McKinley,  &  D'Angelo,  1996),  and  similar  aural  cues  have  been  successfully 
used  in  applied  settings  such  as  fighter-jet  cockpits  (Sorkin,  1987). 

Research  indicates  that  tactile  stimulation  may  be  suitable  for  drawing  attention  to  changes  in 
state  such  as  automated  shifting  between  engine  modes.  Skylar  and  Sarter  (1999)  found  that 
wrist  mounted  tactile  stimulation  was  able  to  significantly  improve  awareness  of  state 
changes  compared  to  a  visual  condition  (Skylar  &  Sartar,  1999).  Investigations  of  the  use  of 
tactile  enhancement  as  an  aid  in  localization  tasks  have  been  less  successful.  Gilliland  and 
Schlegel  (Gilliland  &  Schlegel,  1994)  investigated  the  use  of  head-mounted  tactile  stimulators 
that  produced  a  light  tapping  on  the  scalp  as  an  aid  in  detection  and  localization  tasks  but 
found  that  it  interfered  with  task  performance.  One  possible  reason  that  tactile  stimulation 
was  not  been  found  to  be  useful  for  localization  is  that  it  is  difficult  to  abstract.  A  sound  or  a 
flashing  light  is  interpreted  as  a  signal  to  orient  to  the  direction  of  its  origin;  in  other  words  a 
sound  that  appears  to  be  coming  from  the  left  is  a  signal  to  look  left.  In  contrast  to  this, 
somatosensory  or  tactile  information  appears  to  be  interpreted  as  a  signal  to  orient  to  where 
the  individual  is  being  touched,  and  because  of  this,  it  appears  that  even  with  training  it  may 
not  be  a  useful  enhancement  device  for  localization  tasks. 

There  are  few  examples  of  the  successful  application  of  olfactory  displays  and  even  fewer  that 
can  be  considered  relevant  to  sub-surface  decision  making  environments.  As  Sanders  and 
McCormick  (1993)  indicate,  there  are  a  number  of  potential  problems  with  olfactory  displays. 
First,  while  the  olfactory  system  is  extremely  sensitive,  it  is  also  prone  to  false  alarm.  In  other 
words,  subjects  tend  to  report  the  presence  of  an  odour  when  no  odour  has  been  presented 
(Richardson  &  Zucco,  1989).  Second,  studies  have  indicated  that  subjects  are  able  to 
distinguish  between  different  odours,  but  have  difficulty  identifying  specific  odours  when 
presented  in  isolation  (Desor  &  Beauchamp,  1974).  Third,  subjects  have  difficulty  in 
distinguishing  between  odours  that  differ  in  intensity  (Engen,  1982).  The  last  two  points 
highlight  the  inherent  difficulty  of  encoding  multi-dimensional  data  as  olfactory  stimuli. 


2.2.8  Cognitive  Factors:  Summary 

This  section  outlined  a  number  of  cognitive  factors  that  may  influence  display  utility: 

•  Attention  limitations  and  cognitive  load. 

•  Short-term  memory  limitations. 

•  Long-term  memory  limitations  and  object  representation. 
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•  Factors  associated  with  colour-coding  such  as  semantic  associations  and  internal 
representations  of  colour  similarity. 

•  The  relevance  and  possible  uses  of  auditory,  tactile  and  olfactory  display  enhancements. 
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2.3  Ecological  Factors  Influencing  Display  Utility 

Here  the  term  ecological  refers  to  the  real-world  setting  of  display  use.  All  real-world  decision¬ 
making  occurs  within  a  unique  system  or  environment  Todd  &  Gigerenzer  (1999),  defined  by 
factors  such  as  the  type  and  quality  of  information  that  is  available,  the  way  this  information 
is  organized,  the  capabilities  of  the  various  machines  and  individuals  interacting  within  the 
environment,  and  the  types  of  decisions  that  need  to  be  made.  The  decision-making 
environment  can  be  thought  of  as  an  interactive  system  because  the  various  factors  that  define 
the  environment  impart  varying  degrees  of  influence  upon  each  other.  For  example,  if  access 
to  information  is  restricted,  then  this  may  influence  the  ability  of  the  user  to  make  accurate 
decisions.  Alternatively,  ample  but  poorly  organized  data  pose  potential  problems  of  their 
own,  such  as  cognitive  overload. 

Visual  displays  currently  being  used  as  decision  aids  in  undersea  warfare  resemble  the  types 
of  displays  conventionally  used  in  aerial  and  surface  warfare.  Typically  the  display  provides  a 
representation  of  a  platform's  immediate  physical  surrounds,  such  as  local  topography  and 
the  location,  heading  and  speed  of  neighbouring  platforms.  However,  there  are  a  number  of 
factors  that  differentiate  the  decision-making  environment  of  sub-surface  platforms  from 
other  types  of  platform.  For  example,  unlike  aerial  and  surface  platforms,  Collins  class 
submarines  rely  on  passive  sonar  to  determine  the  position,  speed,  heading  and  type  of  local 
targets.  This  means  that  there  is  often  a  degree  of  uncertainty  in  regards  to  these  factors.  Given 
that  the  decision-making  environment  of  the  Collins  class  submarines  is  specific,  and 
differentiated  from  other  types  of  decision-making  environments,  it  is  important  that  the 
visual  displays  employed  as  decision-making  aids  be  tailored  to  suit  this  environment.  The 
following  sections  outline  a  number  of  ecological  factors,  or  factors  related  to  the  decision¬ 
making  environment,  that  may  influence  display  utility,  and  some  of  the  associated 
considerations. 


2.3.1  Task  requirements 

Specific  tasks  performed  by  the  platform,  and  the  sorts  of  decisions  that  are  required  in 
regards  to  performing  these  tasks,  should  be  a  major  consideration  in  regards  to  display 
design.  To  ensure  that  the  displays  support  the  decision-making  process,  display  capabilities 
should  match  task  requirements  as  closely  as  possible.  A  comprehensive  understanding  of 
display  requirements  can  be  obtained  by  assessing  the  types  and  frequencies  of  the  various 
operations  such  as  reconnaissance,  aggressive  actions,  etc  that  are  performed  by  the  platform, 
and  the  roles  that  the  commanding  officer  and  support  staff  play  in  their  execution.  A  number 
of  different  methods  exist  that  would  be  suitable  for  obtaining  this  information,  including 
Task  Analysis  (Preece,  1994)  and  Cognitive  Workplace  Analysis  (Vicente,  2002;  Vicente  & 
Rasmussen,  1992). 

Knowledge  of  the  specific  requirements  of  the  display,  or  displays,  to  be  developed  can  be 
used  to  govern  the  form  and  content  of  the  display.  For  example,  if  the  display  is  to  be  used 
for  tasks  involving  both  relative  position  judgments  and  shape  understanding,  then  it  may  be 
necessary  to  combine  both  2-D  and  3-D  display  formats,  or  allow  the  users  to  shift  between 
the  two  formats.  Additionally,  this  knowledge  can  be  used  to  determine  what  sort  of 
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information  needs  to  be  represented  explicitly.  For  example,  in  a  physical  display  of  a 
platform  and  its  surroundings  it  may  be  useful  to  represent  features  such  as  political 
boundaries  and  shipping  lanes.  However,  as  discussed  in  the  section  on  cognitive  factors  that 
influence  display  utility,  display  clutter  can  lead  to  a  number  of  negative  effects  such  as 
cognitive  overload.  The  relative  importance  and  frequency  of  use  of  this  type  of  information 
can  be  used  to  determine  if  this  information  needs  to  be  explicitly  represented  in  the  display, 
or  if  it  needs  to  be  accessed  only  when  it  is  specifically  needed.  Additionally,  this  can  also  be 
used  to  determine  the  physical  form  of  the  display:  'hidden'  functions  that  are  used  more 
than  others  may  need  to  have  'hot'  buttons  that  provide  instant  access  to  the  data,  whereas 
space  limitations  may  mean  that  less  useful  or  less  frequently  used  functions  are  accessed  via 
a  menu. 

Research  has  indicated  that  the  way  that  data  are  organized  within  information  hierarchies 
influences  the  types  of  actions  that  are  taken  by  users.  Vicente  and  Rasmussen  argue  that 
interface  users  can  encounter  three  types  of  situations:  routine,  anticipated  situations;  non¬ 
routine  but  anticipated  situations;  and  non-routine  unanticipated  situations  (Vicente,  2002; 
Vicente  &  Rasmussen,  1992).  Display  interfaces  that  are  only  used  in  routine,  anticipated 
situations  only  need  a  limited  repertoire  of  functions,  as  the  range  of  situations  the  user  faces 
is  likewise  limited.  In  routine  and  anticipated  situations  the  user  only  needs  access  to  a  limited 
body  of  information.  However,  in  unanticipated  situations,  which  often  require  adaptive 
problem  solving-type  behaviour,  users  may  need  to  have  access  to  a  wider  range  of  data  and 
functions 

Specific  knowledge  of  display  requirements  can  also  be  used  to  determine  how  data  should  be 
organized.  The  hierarchical  structure  within  which  data  is  organized  determines  the  pathways 
travelled  in  order  to  access  information.  It  is  imperative  that  data  be  organized  in  a  manner 
that  is  plausible  in  regards  to  the  types  of  tasks  performed.  As  previously  mentioned,  it  may 
be  useful  for  display  users  to  be  able  to  superimpose  information  such  as  shipping  lanes, 
political  borders,  weapons  range  envelopes,  and  uncertainty  ellipses  onto  the  conventional 
display.  There  are  a  number  of  ways  that  this  information  may  be  organised.  One  way  would 
be  to  have  a  button  with  an  appropriate  label  such  as  "Display  Enhancements".  Pressing  this 
button  would  provide  the  user  with  a  menu  of  the  various  enhancement  types,  each  of  which 
represented  a  sub-group  of  functions  or  enhancements.  However,  if  certain  functions  are  more 
important,  or  are  accessed  more  often  than  others,  than  this  may  not  be  the  most  appropriate 
way  of  organising  the  information.  Additionally,  it  may  be  useful  for  users  to  have  access  to 
the  data  being  displayed  on  the  other  consoles  within  the  operations  room.  This  provides  the 
possibility  of  information  being  integrated  at  lower  levels  in  the  chain  of  information,  thereby 
easing  the  cognitive  load  further  up  the  chain. 

2.3.2  Information  Type 

The  type  and  quality  of  available  information  will  influence  display  utility.  Sub-surface 
platforms  rely  heavily  upon  passive  sonar  to  obtain  information  about  target  location,  speed, 
heading  and  type.  This  imposes  restrictions  upon  the  degree  of  confidence  that  can  be 
associated  with  the  data.  The  representation  of  data  in  conventional  visual  displays  is  based 
upon  the  assumption  that  the  positions  of  target  platforms  in  relation  to  ownship  are  an 
accurate  description  of  platform  locations  in  the  real  world.  However,  given  that  the  type  of 
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information  available  within  sub-surface  decision  making  environments  is  for  the  most  part 
uncertain,  the  representations  of  these  data  should  reflect  the  uncertainty.  If  the  locations  of 
target  platforms  are  portrayed  as  being  unproblematic,  this  may  lead  to  users  making 
decisions  based  upon  inaccurate  information.  Alternatively,  if  the  degree  of  confidence  in  a 
target's  position  or  type  is  explicitly  represented,  it  appears  sensible  though  not  necessarily 
the  case,  that  the  user  take  such  information  into  account  when  making  a  decision. 

A  number  of  possibilities  exist  for  representing  uncertainty  in  a  display.  For  example,  location 
can  be  represented  by  a  circle  or  sphere  centred  on  each  icon  or  symbol.  The  size  of  the  sphere 
or  circle  could  be  determined  by  representing  a  Gaussian  probability  distribution  of  possible 
locations.  This  would  provide  an  indication  of  the  level  of  uncertainty  associated  with  the 
target.  Larger  circles  would  indicate  broader  distribution  of  uncertainty.  Likewise,  shape,  say 
an  elipse,  could  represent  directional  variance  (skewing  of  the  Gaussian  scheme)  arising  in 
many  sensors.  Additionally,  using  probability  distributions  to  represent  uncertainty  would 
allow  the  user  to  make  assumptions  about  the  location  of  the  target  within  the  sphere  or  circle 
of  uncertainty,  with  location  probability  decreasing  as  a  non-linear  function  of  distance  from 
centre.  A  number  of  researchers  have  tested  the  usefulness  of  uncertainty  representations  such 
as  these.  Studies  by  Andre  and  Cutler,  1998  (as  cited  in  Bisantz  et  al.,  2002;  and  Kirschenbaum 
and  Arruda,  1994)  both  used  ellipses  or  circles  to  represent  uncertainty  regarding  platform 
location.  Both  studies  found  that  subject  performance  improved  significantly  when  using 
displays  capable  of  representing  uncertainty.  St.  John,  Callan,  Proctor  and  Holste  (2000)  used 
irregular  ellipses  or  'blobs'  to  convey  uncertainty  in  regards  to  position  and  heading.  Results 
indicated  that  use  of  the  uncertainty  representation  was  able  to  improve  recall  of  the  distance 
and  direction  of  future  enemy  positions  (St.  John,  Callan,  Proctor,  &  Holste,  2000a). 

A  number  of  alternative  approaches  have  also  been  suggested.  In  addition  to  circular 
uncertainty  representations  Andre  and  Cutler  (1998,  cited  in  (Bisantz  et  al.,  2002))  also 
compared  numeric  and  arc  representations  of  uncertainty  in  regards  to  platform  heading.  The 
numerical  representation  involved  assigning  each  platform  a  numerical  value  representing 
heading  uncertainty,  and  the  arc  representation  displayed  a  curved  line  that  covered  the  range 
of  possible  movement  headings  (the  greater  the  heading  uncertainty,  the  greater  the 
corresponding  arc).  The  results  indicated  that  each  of  the  three  uncertainty  representations 
were  able  to  improve  subject  performance  in  comparison  to  a  no-representation  condition.  St. 
John,  Callan,  Proctor  and  Holste  (2000)  used  simple  symbology  to  indicate  three  states  of 
uncertainty  (high,  medium  and  low)  in  a  tactical  decision  making  exercise.  Results  indicated 
that  the  degree  of  uncertainty  associated  with  each  unit  influenced  the  types  of  decisions 
made.  No  empirical  data  are  available  comparing  the  effectiveness  of  the  displays  using 
uncertainty  representations  to  displays  without  uncertainty  (St.  John  et  al.,  2000a). 

Bisantz  et  al  (2002)  suggested  that  blurring  or  degradation  could  be  used  to  represent 
uncertainty.  The  degree  of  uncertainty  could  be  represented  by  the  degree  to  which  an  icon 
was  degraded,  with  high  degradation  representing  high  uncertainty.  Although  this  technique 
is  yet  to  be  empirically  tested  there  are  a  number  of  reasons  for  questioning  its  utility.  In  the 
first  place,  as  has  been  previously  mentioned,  blurred  or  out  of  focus  objects  can  be  distracting 
and  might  possibly  cause  fatigue  as  the  visual  system  struggles  to  bring  the  blurred  image 
into  focus.  Secondly,  the  use  of  blurring  or  degradation  effectively  removes  useful  information 
from  the  display:  designers  should  be  aiming  at  representing  uncertainty  within  a  display,  not 
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creating  it.  Similarly,  some  researchers  have  warned  against  the  use  of  linguistic 
representations  of  uncertainty  (such  as  probable,  doubtful,  likely),  as  they  are  open  to 
individual  interpretation,  and  are  therefore  unsuitable  for  representation  as  a  psychological 
continuum  (Bisantz  et  al.,  2002). 

There  is  much  that  is  not  yet  understood  in  regard  to  the  manner  in  which  people  think  about 
the  uncertainty  of  information  and  how  assisting  such  thinking  through  display  design  might 
assist  decision-making.  This  would  seem  to  offer  a  fruitful  research  direction  in  the  future. 


2.3.3  Physical  environment 

The  structure  of  the  physical  environment  in  which  the  displays  are  to  be  used  may  also 
impact  upon  display  utility.  For  example,  studies  have  indicated  that  stereoscopic  and 
immersed  or  virtual  reality  displays  are  generally  more  precisely  interpreted  than  screen- 
based  perspective  projection  displays  (Wickens  et  al.,  2000).  This  research  will  be  addressed  in 
more  detail  in  a  later  section.  Not  surprisingly  however,  the  physical  limitations  of  the  sub¬ 
surface  decision  making  environment  may  preclude  the  use  of  goggle  or  headset  viewed 
displays.  Additionally,  audio  cues  have  been  used  to  supplement  situation  awareness  in  a 
number  of  different  applied  settings  (Sanders  &  McCormick,  1993).  However,  a  device  such  as 
this,  that  has  been  found  to  be  useful  in  the  cockpits  of  single-man  fighter  jets,  may  hinder 
operations  in  an  environment  where  a  number  of  individuals  are  working  together  in  close 
proximity. 


2.3.3. 1  Ecological  Interface  Design 

One  area  of  research  that  has  evolved  from  the  demand  for  theoretical  tools  to  govern  the 
design  of  computer  displays  is  ecological  interface  design  (EID)  (Vicente,  2002;  Vicente  & 
Rasmussen,  1992).  EID  has  been  primarily  used  in  the  field  of  industrial  design.  Studies 
comparing  EID  and  conventional  interfaces  in  areas  such  as  pasteurisation,  neonatal  intensive 
care,  and  thermal  hydraulic  process  control  micro-worlds  have  indicated  that  interfaces 
designed  by  employing  EID  can  perform  better  than  interfaces  designed  by  other,  more 
conventional,  means  (Vicente,  2002).  Although  it  has  been  suggested  that  EID  should  only  be 
applied  to  work  domains  that  are  governed  by  physical  laws  and  not  human  intentions 
(Wong,  1997),  there  are  a  limited  number  of  studies  in  which  EID  has  been  applied  to 
command  and  control  situations  (Chalmers,  Easter,  &  Potter,  2000;  Vicente,  2002). 

Chalmers,  Easter  and  Potter  (2000)  employed  EID  to  develop  a  threat  management  display  for 
a  Canadian  Halifax  Class  Frigate  (Chalmers  et  al.,  2000).  The  display  was  intended  to  provide 
an  overview  of  the  tactical  situation  using  a  combination  of  functional  information  (such  as 
threat  identification,  and  time  until  likely  weapons  release)  and  physical  information  (physical 
location,  heading,  speed)  that  was  capable  of  representing  air,  surface  and  sub-surface 
warfare.  Rather  than  attempting  to  integrate  functional  information  into  an  already  crowded 
physical  display,  Chalmers  et  al  (2000)  developed  two  integrated  displays  (Figure  10).  On  the 
right  is  a  conventional  2-D  geo-physical  display,  and  on  the  left  a  threat  overview  display. 
Figure  11  provides  a  close-up  of  the  threat  overview  display  demonstrating  a  number  of  its 
functions.  Target  platforms  are  sorted  according  to  type  (air,  surface,  subsurface  or  missile). 
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contact  state  (unresolved,  suspect,  hostile,  assigned  or  engaged)  and  temporal  distance  from 
ownship.  The  threat  status  of  target  platforms  can  be  seen  to  increase  as  they  move  closer  to 
the  bottom  left  hand  corner  of  the  screen.  Chalmers  et  al  (2000)  suggest  that  the  display  has 
the  potential  to  increase  situation  awareness  by  explicitly  representing  all  target  platforms 
within  a  hierarchical  structure,  organized  according  to  task  priority.  However,  as  yet,  the 
display  has  not  been  tested  empirically.  The  work  is  presented  here  to  demonstrate  the 
concept  of  EID. 


Figure  10.  Integrated  display  for  Halifax  Class  Frigate.  From  Chalmers  et  al  (2000). 

Bisantz  et  al  (2002)  provide  a  number  of  examples  of  displays  designed  as  tactical  decision¬ 
making  aids  that  were  developed  using  cognitive  task  analysis  (CTA).  Like  the  display 
developed  by  Chalmers  et  al  (2000),  each  of  the  displays  combines  geo-physical  and  functional 
information  and  were  developed  to  support  the  specific  needs  of  the  user.  Only  one  of  the 
displays,  a  visualization  of  combat  readiness  for  an  Army  command  application  (Talcott, 
Bennet,  Martinez,  and  Stansifer,  2001  cited  in  Bisantz  et  al,  2002)  had  been  tested  empirically: 
Bisantz  et  al  (2002)  indicate  that  subjects  using  the  EID  display  demonstrated  superior 
performance  in  comparison  to  subjects  using  an  alternative  digital  display. 
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Figure  11.  Threat  Overview  Display.  From  Chalmers  et  al  (2000). 


2.3.4  Ecological  Factors:  Summary 

This  section  outlined  a  number  of  cognitive  factors  that  may  influence  display  utility: 

•  The  effect  of  task  requirements  upon  display  design. 

•  The  effect  of  information  type  and  quality  upon  display  utility,  and  the  case  for  representing 
uncertainty. 

•  Restrictions  imposed  by  the  physical  environment. 

•  One  method  of  capturing  such  factors  in  design  is  called: 

Ecological  Interface  Design. 
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3.  The  Comparative  Utility  of  Two-Dimensional  Versus 
Three-Dimensional  Displays 

Recent  technological  advances  have  meant  that  a  variety  of  3D  display  alternatives  have 
become  available  in  addition  to  conventional  2D  displays.  Alongside  these  technological 
advances  has  been  a  growing  body  of  research  exploring  the  practical  implications  of  the  new 
display  technology.  One  area  that  has  received  particular  interest  concerns  the  comparative 
utility  of  2-D  and  3-D  displays. 


3.1  Display  Dimensionality 

Various  methods  have  been  developed  for  presenting  three-dimensional  displays.  A  number 
of  methods,  such  as  stereoscopic  and  immersive  (or  virtual  reality)  imaging  involve  the  use  of 
goggles  or  a  head-set.  Alternatively,  three-dimensional  perspective  imaging  can  be  created  on 
standard  computer  monitors  or  view  screens.  While  there  is  some  evidence  to  suggest  that 
stereoscopic  and  immersed  imaging  are  superior  to  view-screen  perspective  displays  in  terms 
of  image  clarity  (Wickens,  Thomas,  &  Young,  2000),  a  number  of  studies  have  reported  side- 
effects  such  as  nausea  and  dizziness  after  the  use  of  goggles  or  head-sets  (Regan  &  Price, 
1994).  Additionally,  in  many  applied  settings  the  use  of  view-screens  is  preferable  for  practical 
reasons:  for  example,  many  people  can  potentially  view  a  single  monitor,  whereas  only  one 
person  has  access  to  the  information  presented  on  an  individual  head-set.  A  further  argument 
in  favour  of  perspective  displays  is  that  they  are  computationally  less  costly.  For  these  reasons 
the  majority  of  research  into  3D  display  imaging  has  focused  upon  perspective  displays. 

An  obvious  advantage  that  3-D  displays  have  over  2-D  displays  is  the  inherent  ability  to 
represent  explicitly  all  three  spatial  dimensions  within  a  single  image.  Users  of  3-D  displays 
have  immediate  access  to  information  concerning  latitude,  longitude  and  altitude,  whereas  in 
2-D  displays  data  relating  to  one  of  the  three  dimensions  must  be  omitted,  or  encoded  either 
digitally  or  through  analogue  markings  such  as  contour  lines.  Because  of  this,  2-D  displays 
might  possibly  impose  a  greater  cognitive  load  upon  the  viewer  as  they  shift  their  attention 
between  the  different  information  types.  In  light  of  this,  it  has  been  theorised  that  for 
integrated  attention  tasks,  or  tasks  requiring  the  integration  of  information  concerning  all 
three  spatial  axes  (such  as  aerial  or  undersea  navigation),  3-D  displays  should  show  an 
advantage  over  2-D  displays.  No  such  distinction  between  2-D  and  3-D  display  formats 
should  be  obvious  for  separable  attention  tasks  in  which  attention  need  only  be  focused  upon 
one  dimension  (Haskell  &  Wickens,  1993).  Although  this  distinction  seems  intuitively 
plausible,  empirical  studies  have  indicated  that  it  provides  a  poor  prediction  of  comparative 
display  utility.  For  example,  St.  John,  Cowen,  Smallman  and  Oonk  (2001a)  summarised  16 
recently  published  studies  in  which  comparisons  were  made  between  2-D  and  3-D  displays  in 
applied  aviation  settings.  The  experimental  tasks  were  classified  as  requiring  either  integrated 
or  separable  attention,  and  of  the  11  studies  classified  as  integrated,  only  4  found  a  3-D 
advantage,  whereas  5  found  a  2-D  advantage  and  2  were  tied. 
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St.  John  et  al  suggest  that  the  lack  of  evidence  for  an  integrated  task  3-D  advantage  may  be 
because  the  distorting  effects  of  3-D  displays  outweigh  any  benefit  of  integrated  viewpoints 
when  making  precise  relative  position  judgements  (St.  John  et  al.,  2001a).  3-D  displays  are 
prone  to  three  distortion  effects:  First,  foreshortening  or  slant  underestimation,  whereby  the 
relative  angles  of  objects  are  distorted  according  to  the  relative  distance  between  the  object 
and  the  viewer.  For  example,  a  hillside  that  slopes  away  from  the  viewer  at  45-degree  angle 
will  appear  to  be  more  acute  the  closer  it  is  to  the  viewer.  Secondly,  in  order  to  represent  three 
dimensions  upon  a  two-dimensional  screen,  at  least  two  of  the  spatial  axes  must  be 
compressed.  As  a  result  of  this,  parallel  lines  will  appear  to  converge  as  they  move  away  from 
the  viewer,  and  the  foreground  may  be  positioned  lower  on  the  display  screen  than  the 
background  or  horizon.  Thirdly,  as  Figure  12  illustrates  3-D  displays  are  subject  to  what  is 
termed,  "line-of-sight  ambiguity".  This  uncertainty  relates  to  the  undefined  relationship 
between  size-distance  scaling  across  all  three  spatial  dimensions  (that  is,  the  geometry  of  the 
image  can  be  identical  at  any  number  of  different  viewpoints  of  a  scene  -  relative  motion 
appears  to  help  us  avoid  such  uncertainty  in  natural  vision).  In  a  top-down  2-D  display, 
however,  all  information  regarding  vertical  distances  is  lost,  however  distances  along  the  two 
horizontal  planes  can  be  represented  without  distortion. 


x,  y  certain 
z  unknown 


Z 


xf  y,  z  all 

somewhat 

uncertain 

3-D 

Display 


y 


2-D 

Display 


x 


Figure  12.  A  comparison  of  line-of-sight  ambiguity  in  2-D  and  3-D  displays.  From  Smallman,  St. 
John  and  Cowen  (2002). 


Unlike  3-D  displays,  2-D  displays  are  scaled  linearly.  As  a  result  of  this,  the  relationships 
between  angles  and  distances  within  a  target  environment  are  preserved  within  the  display. 
For  example,  lines  that  are  parallel  in  the  target  environment  will  be  parallel  within  the 
display.  Although  2-D  displays  cannot  provide  a  comprehensive  integrated  representation  of 
an  environment,  the  viewer  can  assume  that  the  representation  is  accurate.  In  this  way,  the 
relationship  between  2-D  and  3-D  displays  can  be  thought  of  in  terms  of  a  trade-off  between 
information  quantity  and  information  quality. 
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Given  the  relative  strengths  and  weaknesses  of  the  two  display  types,  St.  John  and  colleagues 
(St.  John  &  Cowen,  1999a,  1999b;  St.  John  et  al.,  2001a;  St.  John,  Oonk,  &  Cowen,  2000b;  St. 
John,  Smallman,  Bank,  &  Cowen,  2001b;  St.  John,  Smallman,  Oonk,  &  Cowen,  2000c)  proposed 
an  alternative  method  of  determining  which  tasks  might  be  better  served  by  either  display. 
They  argue  that  3D  displays  are  most  useful  for  tasks  that  require  shape  understanding,  such 
as  matching  an  out-of-cockpit  view  of  terrain  with  a  2D  or  3D  digital  display,  whereas  tasks 
that  require  precise  relative  position  judgements,  such  as  predicting  if  two  objects  will  collide 
on  their  present  courses,  benefit  from  2D  displays. 

St.  John  et  al  tested  their  hypotheses  by  evaluating  subject  performance  upon  two  abstract 
visuo-perceptual  tasks:  object  recognition  and  relative  positioning  (St.  John  &  Cowen,  1999a, 
1999b).  In  both  experiments  the  subjects  were  presented  with  either  a  3D  perspective  view  or 
set  of  three  2D  plan  views  (top,  front  and  side)  of  simple  3D  block  shapes  comprised  of  10  to 
16  cubes  (Figure  13).  In  the  first  experiment,  the  subjects  were  required  to  identify  the  target 
object  from  a  set  of  spatially  rotated  alternatives.  The  subjects  were  able  to  perform  more 
accurately  and  faster  when  provided  with  a  3D  perspective  view.  In  the  second  experiment, 
the  subjects  were  required  to  indicate  the  position  of  a  sphere  in  relation  to  the  block  shapes. 
The  2D  display  was  found  to  be  superior  to  the  3D  display  in  terms  of  response  times  and 
accuracy. 


s  > 

Figure  13.  Example  of  test  stimuli  from  St.  John  et  al  (1999).  The  position  of  the  sphere  in  relation  to 
the  cubes  is  ambiguous  in  the  3-D  display.  However ,  the  3-D  display  provides  a  superior 
indication  of  the  shape  of  the  block-object. 


For  the  most  part,  experimental  findings  across  a  range  of  abstract  and  applied  settings  have 
replicated  the  results  of  this  experiment.  We  will  now  review  the  literature  comparing  2-D 
and  3-D  displays  in  light  of  St.  John  and  colleagues'  distinction  between  relative  position 
judgements  and  shape  understanding. 
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3.1.1  Tasks  Involving  Judgements  of  Relative  Position 

Tittle,  Woods,  Roesler,  Howard  &  Phillips  (2001)  performed  a  meta-analysis  on  eleven 
separate  studies  (McKee,  Levi  &  Bowne,  1990;  Graham,  1965;  Phillips  &  Todd,  1996;  Oomes  & 
Dijkstra,  In  Press;  Fechner,  1860/1966;  Lapin  &  Furqua,  1983;  McKee  &  Welch,  1992; 
Westheimer,  1975;  Tittle,  Todd,  Perotti  &  Norman,  1995;  Kramher  &  Fahle,  1996;  Norman  & 
Lappin,  1992:  all  cited  in  Tittle  et  al,  2001)  in  which  discrimination  accuracy  was  compared  for 
2D  and  3D  versions  of  seven  different  abstract  visuo-perceptual  tasks:  judgements  of  relative 
distance,  relative  curvature,  global  object  orientation,  distance  bi-section,  relative  size,  co¬ 
linearity  or  co-planarity,  &  flat  versus  curved  categorisation.  2D  visualisations  were  found  to 
be  better  than  3D  across  all  tasks,  with  the  greatest  differences  being  found  for  tasks  requiring 
precise  quantitative  judgements  rather  than  qualitative  judgements.  Alexander  and  Wickens 
cite  nine  studies  involving  position  estimation  tasks  in  which  an  advantage  was  found  using 
2-D  displays  (McGreevy  &  Ellis,  1986;  Tharp  &  Ellis,  1990;  Kim,  Ellis,  Tyler,  Hannaford,  & 
Stark,  1987;  Barfield  &  Rosenberg,  1995;  Wickens,  1995;  Wickens  &  May,  1994;  Yeh  & 
Silverstein,  1992;  Boyer,  Campbell,  May,  Merwin,  &  Wickens,  1995:  all  cited  in  (Alexander  & 
Wickens,  2002)).  Additionally,  St.  John  et  al  (2001a)  reviewed  14  studies  from  the  aviation 
literature  involving  judgments  of  relative  position:  seven  studies  found  a  2-D  advantage, 
three  found  a  3-D  advantage  and  four  found  no  significant  difference. 

St.  John  and  colleagues  performed  a  series  of  experiments  involving  displays  of  realistic 
terrain.  Subjects  were  required  to  perform  a  number  of  different  visuo-perceptual  tasks  using 
three  different  display  conditions:  a  topographical  2D  display,  a  45  degree  3D  perspective 
display  and  a  90  degree  topographical  3D  perspective  display.  Performance  on  a  task 
requiring  judgements  of  relative  position,  in  which  the  subjects  indicated  which  of  two  points 
was  higher,  was  found  to  be  significantly  better  using  the  2D  display  than  the  two  3D 
displays.  Additionally,  there  was  no  significant  difference  in  reaction  time  between  the  three 
display  conditions.  This  is  interesting  because  it  indicates  that  despite  having  to  search  for 
digital  information  regarding  altitude,  the  2D  display  was  still  faster  than  the  45  degree 
perspective  display  in  which  the  vertical  dimension  was  explicitly  represented  (St.  John  et  al., 
2000b).  Similarly,  the  2D  topographical  display  was  found  to  be  significantly  more  accurate 
and  faster  than  the  3D  displays  for  a  task  in  which  subjects  estimated  the  latitude,  longitude 
and  altitude  distances  between  two  points  (St.  John  et  al.,  2000c). 

In  a  similar  manner  Wickens,  Thomas  and  Young  (Wickens  et  al.,  2000)  compared  subject 
performance  on  a  series  of  applied  visuo-perceptual  tasks  involving  topographic  (2D),  45 
degree  exocentric  (CT  screen)  perspective  and  immersed  (virtual)  perspective  displays  of 
realistic  terrain.  Replicating  the  findings  of  St.  John  et  al  (St.  John  et  al.,  2000b;  St.  John  et  al., 
2000c)  judgements  of  relative  distances  were  found  to  be  significantly  more  accurate  using  the 
2D  display.  It  is  interesting  to  note  that  the  subjects  expressed  near-equal  confidence  in  their 
judgements  across  the  three  display  conditions,  regardless  of  accuracy.  This  is  particularly 
worrisome  as  it  indicates  that  when  using  the  3D  displays  the  subjects  believed  they  were 
performing  more  accurately  than  they  actually  were. 
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3.1.2  Tasks  Involving  Shape  Understanding 

Studies  of  tasks  that  can  be  categorised  as  involving  shape  understanding  can  be  classified 
into  two  groups:  view-matching  tasks  and  line-of-sight  tasks.  The  number  of  studies  involving 
applied  shape  understanding  tasks  is  much  smaller  than  for  relative  position  tasks  and,  as 
such,  it  is  difficult  to  interpret  the  findings  with  any  degree  of  confidence.  Despite  this,  the 
studies  appear  to  indicate  that  there  may  be  a  time  advantage  for  perspective  displays  on 
shape  understanding  tasks:  subjects  using  perspective  displays  for  shape  understanding  tasks 
require  less  time  to  produce  responses  that  are  equal  in  accuracy  to  responses  made  using  2-D 
displays. 

St.  John  et  al  (2000b)  compared  2-D  topographical,  with  both  a  90  degree  (top-down  3-D 
display)  and  a  45  degree  perspective  3-D  display  (viewpoint  represented  by  perspective 
projection  at  45  degrees  from  a  central  image  axis)  for  a  view-matching  task  in  which  the 
subjects  were  required  to  match  a  designated  section  of  a  display  with  one  of  four  immersed 
perspective  views  of  terrain  (Figure  14).  No  significant  difference  in  accuracy  was  found 
between  the  three  displays,  although  performance  using  the  3D  displays  was  found  to  be 
significantly  faster  than  performance  using  the  2D  display  (St.  John  et  al.,  2000b).  In  a  similar 
study  by  Hickox  &  Wickens  (1999)  in  which  subjects  were  required  to  match  an  out-of-cockpit 
view  with  either  a  2-D  topographical  or  a  perspective  display,  the  results  indicated  a  3-D 
advantage.  In  contrast  to  these  two  studies.  Green  and  Wiliams  (1992;  cited  in  Lasswell  & 
Wickens,  1995)  found  that  subjects  using  perspective  displays  for  view-matching  were  slower 
and  less  accurate  than  subjects  using  2-D  displays. 


Figure  14.  Example  of  test  stimuli  from  a  view-matching  task.  From  (St.  John  et  al,  2000b) 
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St.  John  et  al  (2000b)  found  no  significant  difference  in  accuracy  between  2-D  topographical, 
and  45  and  90  degree  perspective  displays  for  a  line-of-sight  task,  in  which  subjects  were 
required  to  indicate  if  a  designated  point  was  visible  from  another  point,  or  if  it  was  occluded 
by  the  terrain.  Once  again,  judgments  made  using  the  45  degree  perspective  display  were 
found  to  be  significantly  faster  than  when  using  the  2D  display,  but  no  significant  difference 
in  reaction  time  was  found  between  the  2D  and  90  degree  perspective  display  (St.  John  et  al., 
2000b).  Additionally,  in  a  task  requiring  multiple  line-of-sight  judgments  using  either  a  2D 
topographical  display  or  a  45  degree  perspective  display,  no  significant  difference  in  accuracy 
was  found  between  the  two  views,  but  subjects  employing  the  perspective  display  were  found 
to  be  significantly  faster  (St.  John  et  al.,  2001b).  In  contrast  to  these  findings,  a  similar  study 
performed  by  Wickens  et  al  (2000)  found  no  significant  difference  between  2-D  and 
perspective  displays  for  either  accuracy  or  response  time. 

Although  the  literature  has  provided  less  than  conclusive  evidence  of  a  3-D  advantage  for 
shape-understanding  tasks,  this  may  in  part  be  due  to  the  type  of  tasks  studied.  On  the  one 
hand,  it  appears  that  relatively  few  applied  tasks  actually  involve  shape  understanding.  On 
the  other,  it  could  be  argued  that  shape-understanding  tasks  such  as  the  line-of-sight 
judgments  also  involve  a  degree  of  relative  position  judgment  (e.g.,  the  relative  altitudes  of 
two  antennae  towers  in  relation  to  the  altitude  of  a  ridge-line  (St.  John  et  al.,  2000b))  and  this 
may  lessen  any  advantage  provided  by  the  3-D  displays. 


3.1.3  Display  Augmentation 

One  approach  to  improving  performance  has  been  to  augment  perspective  displays.  Typically 
the  aim  of  display  augmentation  has  been  to  remove  or  reduce  the  ambiguity  associated  with 
perspective  displays.  For  example,  a  number  of  researchers  have  sought  to  minimize  distance 
ambiguity  through  the  use  of  drop-lines,  shadows  or  by  scaling  the  size  of  platforms  in 
relation  to  distance  from  viewpoint. 

St.  John  and  colleagues  found  that  a  2-D  topographical  display  was  significantly  more 
accurate  and  faster  than  both  a  90  degree  top-down  3-D  display  and  a  45  degree  3-D  display 
for  a  task  in  which  subjects  estimated  the  latitude,  longitude  and  altitude  distances  between 
two  points  (St.  John  et  al.,  2000c).  However,  when  grid  and  contour  lines  were  added  to  the 
displays,  performance  on  the  90  degree  3-D  display  improved  to  the  level  of  the  2-D  display. 
Performance  on  the  45  degree  display  did  not  reach  the  same  level  as  performance  on  the  2-D 
display,  but  the  proportion  of  correct  responses  did  more  than  double. 

Smallman,  Schiller  and  Cowen  (2000)  performed  a  study  of  display  augmentation 
techniques  in  which  subjects  were  required  to  reconstruct  the  relative  positions  of  objects 
within  either  2-D  or  3-D  displays.  Platforms  in  the  3-D  displays  were  either  un-augmented,  or 
were  augmented  with  either  drop-lines,  drop-shadows  or  size-scaling  (maintaining  relative 
size-distance  proportions).  Performance  using  the  2-D  display  was  found  to  be  significantly 
superior  to  the  un-augmented  3-D  display.  Size-scaling  did  not  improve  localization  accuracy, 
but  the  addition  of  either  drop-lines  or  drop-shadows  improved  subject  performance  to  the 
point  that  there  was  no  significant  difference  between  performance  using  the  drop-line  or 
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drop  shadow  augmented  displays  and  performance  using  the  2-D  display  (Smallman  et  al., 

2000). 

Although  relatively  few  studies  have  focused  upon  display  augmentation  the  early 
indication  is  that  performance  on  relative  position  tasks  using  3-D  displays  can  be  improved 
to  equal  or  near  equal  the  level  of  performance  using  2-D  displays.  However,  two  points 
should  also  be  considered.  First,  these  findings  are  as-yet  un-replicated.  Secondly,  Smallman 
et  al  warn  that  the  potential  benefits  of  augmentation  may  be  offset  by  a  rise  in  cognitive  effort 
due  to  excess  display  clutter  (Smallman  et  al.,  2000). 

3.1.4  "Orient  and  Operate" 

As  previously  mentioned,  many  of  the  applied  tasks  for  which  visual  displays  are  used 
appear  to  involve  elements  of  both  shape  understanding  and  relative  position  judgments. 
Given  that  2-D  displays  appear  to  be  better  suited  to  relative  position  tasks,  and  3-D  displays 
better  suited  to  shape  understanding  tasks,  this  presents  a  dilemma  in  terms  of  choosing  the 
most  suitable  display  format  for  the  task.  St.  John  and  colleagues  (St.  John  et  al.,  2001b)  have 
suggested  that  one  possible  solution  is  to  adopt  a  strategy  that  they  call  "orient  and  operate" 
in  which  users  can  switch  between  display  types.  In  this  way  a  perspective  view  can  be  used 
to  obtain  orientation  within  the  target  environment,  and  a  2-D  view  can  be  used  to  perform 
operations  involving  precise  quantitative  judgments.  The  "orient  and  operate"  format  was 
tested  in  an  experiment  in  which  subjects  positioned  a  series  of  antennae  towers  within  a 
realistic  terrain  environment,  such  that  the  towers  were  within  line-of-sight  of  each  other  but 
hidden  from  enemy  emplacements.  Each  problem  had  to  be  solved  within  a  four  minute  time- 
period.  Subjects  were  presented  with  one  of  three  display  conditions:  a  topographical  2-D 
display,  a  45  degree  perspective  display  or  a  side-by-side  display  in  which  both  2-D  and 
perspective  views  were  simultaneously  shown.  The  results  indicated  that  the  side-by-side 
condition  led  to  the  fastest  solution  times  and  the  fewest  number  of  incomplete  problems. 

There  are  a  number  of  unresolved  problems  associated  with  changing  between  2-D  and 
perspective  view  formats,  such  as  maintaining  associations  between  display  elements  as  the 
view  changes,  and  how  platforms  should  be  represented  in  2-D  and  3-D  displays.  Despite  this, 
the  idea  of  designing  displays  that  are  capable  of  providing  interchangeable  2-  and  3-D  views 
seems  promising.  Submarines  operating  in  littoral  environments  typically  perform  both 
relative  position  and  shape  understanding  tasks.  For  example,  it  is  necessary  to  constantly 
monitor  the  positions  of  entities  within  the  waterspace,  and  this  can  be  seen  as  a  relative 
position  task.  Additionally,  display  users  are  also  required  to  perform  shape  understanding 
tasks  such  as  determining  constraining  parameters  like  undersea  topography,  bathymetry  and 
possible  courses  of  action  that  may  pose  a  threat.  Rather  than  compromise  the  display  users' 
ability  to  perform  either  of  these  types  of  tasks,  it  may  be  better  to  provide  an  interchangeable 
display. 


3.1.5  Relative  Utility  of  2-D  and  3-D  Displays:  Summary 

This  section  summarised  research  findings  pertaining  to  the  influence  of  display 
dimensionality  upon  display  utility: 
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•  A  number  of  theoretical  concepts  relating  to  display  dimensionality  were  outlined,  such  as 
integrated  versus  separable  attention,  and  relative  position  versus  shape  understanding  tasks. 

•  Research  has  indicated  that  2-D  displays  may  be  better  for  tasks  involving  judgements  of 
relative  position,  and  3-D  displays  may  be  better  for  tasks  involving  shape  understanding. 

•  It  may  be  possible  to  lift  performance  on  relative  position  tasks  on  3-D  displays  through  the 
use  of  display  augmentation. 

•  An  alternative  solution  is  to  allow  users  to  change  between  2-and  3-D  views  according  to 
specific  task  requirements. 
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4.  Relevance  of  the  Generative  Transformational 
Approach  to  Visual  Perception 

Theories  of  visual  perception  seek  to  account  for  the  processing  of  the  non-symbolic  aspect  of 
images.  In  contrast,  many  questions  concerning  display  design  have  to  do  with  the 
combination  of  symbolic  data  and  information  that  is  more  directly  representational.  Even  in 
the  case  of  the  latter  type  of  information,  the  representation  may  be  of  a  conventional  kind 
(e.g.,  the  use  of  a  battleship  icon),  rather  than  one  dictated  by  purely  optical  constraints,  as  in 
natural  perceptual  situations.  Because  the  relations  between  the  processing  of  visual 
information  and  that  of  symbolic  representations  concerns  two  highly  complex  areas,  there  is 
no  well-developed  theory  encompassing  both.  As  a  result,  there  is  no  over-arching  body  of 
theory  that  is  immediately  applicable  to  the  design  of  visual  information  displays.  However, 
theoretical  accounts  of  visual  perception  have  been  developed  that  have  some  bearing  on  the 
design  and  layout  of  both  symbolic  and  representational  visual  displays. 

So  far  as  the  perceptual  aspect  of  displays  is  concerned,  it  is  convenient  to  distinguish  three 
main  approaches.  These  focus  on  successive  stages  of  the  processing  of  visual  information  by 
considering  (1)  the  geometry  of  projected  images,  (2)  the  responses  by  specific  cells  and 
channels,  and  (3)  the  contribution  by  cognitive  processes.  Recently,  Vickers  (2001;  2002)  has 
developed  a  generative  transformational  (GT)  theory  that  attempts  to  bring  together  these 
different  approaches.  The  generative  transformational  theory  of  visual  perception  and  the 
general  properties  of  the  generative  transformational  approach  are  outlined  in  Appendix  1. 
This  section  outlines  the  relevance  of  generative  transformational  theory  to  display  design  and 
suggests  a  number  of  possible  research  directions. 

4.1  A  generative  transformational  approach  to  human  information 
processing. 

The  GT  approach  is  implemented  in  a  computational  model  that  gives  a  quite  comprehensive 
account  of  the  visual  perception  of  structure,  motion  and  depth.  The  model  is  productive  and 
yields  quantitative  predictions  for  a  wide  variety  of  situations.  In  principle,  the  GT  approach 
is  potentially  relevant  to  a  wide  range  of  purely  perceptual  questions  concerning  display 
design.  The  following  is  a  tentative  classification  of  the  general  kinds  of  questions  for  which 
the  GT  approach  might  have  some  application: 

1.  Questions  concerning  the  display  of  information  concerning  relative  position  (e.g.,  whether 
2D  or  3D  displays  are  more  effective). 

Because  the  GT  approach  is  specifically  concerned  with  information  about  relative 
position,  this  involves  a  fairly  straightforward  application  of  the  theory. 

2.  Questions  concerning  the  recognition  of  shapes  at  different  orientations. 

Because  the  GT  approach  is  concerned  with  finding  the  simplest  transformation  that 
will  map  one  part  of  an  image  onto  another,  or  on  to  another  image,  such  questions 
also  involve  quite  natural  applications  of  the  theory. 
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3.  Questions  concerning  optimal  display  layout. 

As  noted  earlier,  it  is  frequently  recommended  that  designers  of  display  layouts 
should  be  aware  of  Gestalt  principles,  such  as  proximity,  continuity  and  regularity 
when  positioning  display  information  (Helander,  1987;  Wickens  &  Carswell,  1995). 
A  possible  advantage  of  the  GT  approach  is  that  it  provides  an  integrated  and 
quantitative  underpinning  for  the  above  Gestalt  principles.  In  particular,  the  GT 
approach  can  provide  an  explanatory  framework  in  which  to  consider  effects  of  the 
following  factors: 

a.  Rescaling 

b.  Element  density 

c.  Clustering,  regularity  and  symmetry 

d.  Continuity,  closure  and  colinearity 

e.  Boundary  shape 

f.  Order  of  scanning 

4.  Questions  concerning  the  relationships  between  perception  and  action. 

Because,  the  GT  approach  considers  perceived  organisation  as  corresponding  to 
something  like  an  implicit  plan  for  action,  this  approach  may  have  some  relevance 
to  problems  concerning  the  compatibility,  or  otherwise,  between  display 
presentation  and  possible  action. 

5.  Questions  concerning  the  integration  of  internal  structure  and  overall  shape  and  orientation. 

The  GT  approach  uses  a  variety  of  statistics  on  information  about  relative  position  to 
select  transformations  that  provide  an  economical  specification  of  internally 
structured  shapes  at  particular  orientations.  Although  this  aspect  of  the  GT  theory  is 
still  undeveloped,  the  general  direction  that  such  development  might  take  would 
probably  involve  some  compromise  between  economy  and  accuracy  in  specifying 
internal  structure. 


4.2  Two  illustrations  of  the  possible  relevance  of  the  generative 
transformational  approach  to  display  design. 

The  GT  approach  has  been  developed  to  account  for  the  results  of  laboratory  studies  relevant 
to  theories  of  visual  perception,  rather  than  to  address  the  problems  of  display  design.  The 
theory  has  also  been  primarily  addressed,  in  the  first  instance,  to  the  results  of  experiments 
employing  spatial  point  patterns.  However,  there  is  no  reason,  in  principle,  why  the  theory 
can  not  be  extended  to  deal  with  bit  map  images,  for  example.  Similarly,  the  theory  can  be 
generalised  to  deal  with  any  set  of  elements,  with  multiple  sets  of  elements  and  with  elements 
that  have  unequal  weights. 

Meanwhile,  there  are  two  principal  aspects  of  the  theory  that  seem  most  relevant  to  the 
problems  of  display  design.  The  first  concerns  the  dependence  of  the  theory  on  information 
about  the  relative  positions  of  image  elements.  The  second  concerns  the  selection  of 
transformations  that  maximise  self-similarity  within  a  representation  or  between  two  spatially 
or  temporally  distinct  representations. 
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First,  in  sections  3.1  and  3.2,  evidence  was  reviewed  that  indicates  that  performance  on  tasks 
requiring  judgements  of  relative  position  tends  to  be  better  with  2D  than  with  3D  displays. 
This  is  qualitatively  in  line  with  what  might  be  expected  on  a  GT  approach  for  the  following 
reasons. 

Interactions  between  events  tend  to  occur  between  neighbouring  events.  Accordingly,  the  GT 
approach  makes  use  of  a  set  of  techniques,  termed  nearest  neighbour  analysis  (NNA).  These 
techniques  differentiate  between  rival  hypotheses  about  the  processes  that  underlie  the  spatial 
distribution  of  these  events.  NNA  achieves  this  differentiation  by  characterising  the 
distribution  of  distances  between  similar  events  that  are  nearest  (second-nearest, ...,  k- nearest) 
neighbours  and  contrasting  this  with  the  distribution  that  would  be  expected  if  the  events  in 
question  were  distributed  in  a  completely  random  manner. 

NNA  provides  a  powerful  analytic  framework  for  the  detection  of  structure,  motion  and 
depth  in  arrays  of  elements,  random  or  otherwise.  For  example,  in  machine  vision,  Zahn 
(1971)  has  presented  an  analysis  of  the  detection  of  clusters  in  terms  of  nearest  neighbours 
(NNs).  NN  relations  are  consistent  with  the  view,  proposed  by  some  cognitive  theorists,  that 
perception  organises  stimuli  in  an  optimum  way  (e.g.,  producing  the  shortest  path).  NN 
structures  are  independent  of  the  order  in  which  the  elements  are  considered  or  an  array  is 
scanned.  NN  structures  have  considerable  redundancy,  and  so  are  insensitive  to  moderate 
amounts  of  noise.  Evaluations  of  applications  of  NNA  to  automatic  image  classification  in  a 
military  context  have  concluded  that  it  provides  an  efficient  tool  for  scene  analysis,  with 
performance  that,  like  that  of  human  beings,  shows  only  a  graceful  degradation  as  more  noise 
is  added  to  a  pattern  (Singh,  Haddon,  &  Markou,  1999). 

Although  no  investigation  has  yet  been  undertaken  of  the  effect  of  perspective 
transformations  on  the  distribution  of  NNs,  preliminary  explorations  with  shears  suggest  that 
this  distribution  is  relatively  insensitive  (but  not  invariant)  to  shears  applied  to  planar  arrays. 
This  goes  some  way  towards  explaining  the  apparent  superiority  of  2D  displays  of 
information  about  relative  position.  Because  NN  distributions  remain  invariant  under 
similarity  transformations,  2D  displays,  in  which  variations  are  restricted  to  similarity 
transformations,  would  be  expected  to  produce  more  accurate  performance  than  3D  displays 
that  incorporate  perspective  transformations,  and  hence  some  distortion  of  the  NN 
distributions  -  particularly  if  the  rotation  of  a  three-dimensional  shape  results  in  the  occlusion 
of  different  parts  of  that  shape. 

A  second  general  finding  from  comparisons  of  2D  and  3D  displays,  discussed  in  section  2.3,  is 
that  subjects  required  less  time  to  recognise  transformed  shapes  when  these  were  presented  in 
a  3D  perspective  view  than  when  the  shape  information  was  presented  as  a  set  of  three  2D 
plan  views. 

This  finding  is  broadly  consistent  with  a  GT  approach  to  visual  perception.  For  example,  if  an 
outline  shape  is  transformed  in  such  a  way  as  to  be  consistent  with  the  projection  of  a  rotation 
in  depth  of  that  shape,  then  there  are  certain  relations  between  array  elements  that  remain 
invariant  and  can  be  used  to  select  and  guide  the  transformation  required  to  match  the 
original  with  the  transformed  shape. 
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Specifically,  if  lines  joining  contour  points  to  the  centroid  of  the  two-dimensional  projection  of 
an  outline  shape  are  continued,  so  that  they  intersect  the  contour  on  the  opposite  side,  then 
the  ratio  between  the  first  to  the  second  distance  remains  the  same  under  rigid 
transformations  of  the  shape.  As  this  diameter  is  incrementally  rotated,  successive  values  of 
the  ratio  provide  a  waveform  that  is  characteristic  for  that  shape,  irrespective  of  its  orientation 
in  three  dimensions.  This  invariance  can  be  used  to  guide  the  search  for  a  transformation  that 
will  match  the  transformed  shape  with  the  original.  A  similar  technique  has  been  used  by 
Vickers,  Lee  and  Chandrasena  (2002)  for  the  automatic  recognition  of  logos  that  have  been 
degraded  by  noise  and  subject  to  various  transformations. 

The  case  when  the  image  is  a  planar  array  of  elements  has  not  yet  been  systematically 
investigated.  However,  preliminary  explorations  suggest  that  the  distribution  of  NN  distances 
(which  captures  the  internal  structure  of  the  array)  also  remains  relatively  unchanged,  despite 
radical  shearing.  This  means  that  the  search  for  a  transformation  to  relate  a  given  array  to  a 
transformed  version  can  also  be  constrained  (though  perhaps  less  tightly)  by  certain 
regularities. 

According  to  the  GT  approach,  the  visual  system  uses  such  constraints  to  guide  the  search  for 
a  transformation  that  maps  one  part  of  a  representation  on  to  another  or  that  maps  one 
representation  on  to  a  spatially  or  temporally  distinct  representation.  Successful  mapping 
corresponds  to  the  recognition  of  a  presented  shape  as  being  essentially  identical  in  outline 
and/or  or  internal  structure  to  a  projected  view  of  that  shape  at  a  different  orientation. 

When  information  about  a  three-dimensional  shape  is  conveyed  by  the  simultaneous  2D 
presentation  of  three  orthogonal  plans,  the  subject  (presumably)  has  to  mentally  rotate  the 
transformed  shape  so  that  one  or  more  projections  coincide  with  the  presented  plan(s). 
Alternatively,  the  subject  must  somehow  integrate  the  plans  into  a  3D  representation  that  can 
be  rotated  to  match  with  one  of  the  presented  possibilities.  If  we  make  the  reasonable 
assumption  that  information  about  inter-element  distances  is  processed  in  parallel,  but  that 
the  rotation  and  integration  processes  are  serial  and  effortful  cognitive  activities,  as  suggested 
by  the  results  of  a  number  of  studies  (Shepard  &  Cooper,  1982),  then  this  additional 
processing  load  would  explain  the  finding  that  shape  recognition  is  more  efficient  with  3D 
displays. 


4.2.1  Future  Research  Directions  for  the  Generative  Transformational  Approach 

Corresponding  to  the  two  main  applications  identified  above,  there  are  two  main  ways  in 
which  the  supporting  perceptual  theory  might  be  developed  to  provide  more  effective  and 
precise  guidance  for  display  design. 

The  first  concerns  the  detection  and  characterisation  of  structure  in  a  display  or  element  array. 
Nearest  neighbour  distributions  -  and  cumulative  distributions  and  radial-based  functions 
based  on  them  -  are  sensitive  to  the  presence  of  many  types  of  structure.  However,  it  is 
possible  that  an  even  more  general  approach,  which  would  include  nearest  neighbours  as  a 
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special  case,  could  be  developed,  based  on  the  use  of  Voronoi  diagrams  and  Delaunay 
triangulation  (O'Rourke,  1994).  For  example,  recent  research  has  shown  that  human  beings 
are  capable  of  finding  near-optimal  solutions  to  visually  presented  optimisation  problems  that 
are  computationally  difficult  or  intractable  (Graham,  Joshi,  &  Pizlo,  2000;  Vickers,  Butavicius, 
Lee,  &  Medvedev,  2001).  Delaunay  triangulation  can  be  used  to  narrow  the  search  for  such 
minimum  structures  (e.g.,  minimum  spanning  trees  and  solutions  to  Euclidean  travelling 
salesman  problems),  and  the  cumulative  distributions  of  the  areas  of  Delaunay  triangles 
appear  to  differentiate  between  structure  and  randomness  -  and  between  different  types  of 
structure  -  in  a  very  efficient  manner. 

To  our  knowledge,  such  an  approach  has  not  been  considered  in  psychological  or 
neurophysiological  studies  of  visual  perception,  and  the  development  of  a  set  of  statistical 
techniques  -  and  a  corresponding  perceptual  model  -  based  on  Voronoi  diagrams  and 
Delaunay  triangulation  would  constitute  a  major  research  effort.  However,  we  have  already 
established  the  general  direction  that  such  an  investigation  might  follow  through  our 
development  of  techniques  of  nearest  neighbour  analysis.  There  are  also  strong  indications 
from  the  literature  on  computational  geometry  that  such  a  research  initiative  would  be 
extremely  productive  (e.g.,  Asano,  Bhattacharya,  Keil,  &  Yao,  1988;  Jaromczyk  &  Toussaint, 
1992). 

The  second  direction  for  further  research  concerns  the  integration  of  information  about  outline 
and  internal  structure  in  the  perception  of  shape  and  orientation  in  depth.  Because  no 
systematic  account  of  internal  structure  has  been  developed,  little  research  has  been  carried 
out  on  this  question.  It  is  known  that  highly  regular  structures  (unsurprisingly)  produce  more 
accurate  judgments  of  slant.  It  is  also  known  that  judgments  of  slant  are  influenced  by  element 
density.  However,  no  theoretically  motivated,  quantitative  analysis  has  been  carried  out  on 
the  relation  between  such  judgments  and  element  density  or  regularity.  The  techniques  of 
nearest  neighbour  analysis  provide  a  potentially  useful  tool  for  investigating  such  questions. 

4.2.2  Generative  Transformational  Approach:  Summary 

In  this  section  the  relevance  of  the  generative  transformational  theory  of  visual  perception  to 
display  design  was  outlined: 

•  Some  of  the  general  kinds  of  questions  for  which  the  GT  approach  might  have  some  application 
were  summarized,  including  questions  concerning  the  display  of  information  concerning 
relative  position ,  questions  concerning  the  recognition  of  shapes  at  different  orientations , 
questions  concerning  optimal  display  layout,  questions  concerning  the  relationships  between 
perception  and  action,  and  questions  concerning  the  integration  of  internal  structure  and 
overall  shape  and  orientation. 

•  Two  principal  aspects  of  the  theory  that  seem  most  relevant  to  the  problems  of  display  design 
were  illustrated. 

•  Potential  future  research  directions  were  outlined. 
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5.  Assessing  Display  Utility 

5.1  Subjective  and  empirical  data 

Two  main  approaches  have  been  taken  to  assess  display  utility:  subjective  ratings  and 
empirical  testing.  In  the  following  section  some  of  the  advantages  and  disadvantages  of  these 
two  approaches  will  be  outlined  briefly. 

Subjective  ratings  are  introspective,  in  that  they  are  reliant  upon  test  subjects'  personal 
accounts  and  opinions.  Such  ratings  are  generally  obtained  through  simple  questionnaires  or 
interviews,  and  can  be  sampled  either  during  or  after  testing.  While  subjective  ratings  can  be 
useful,  there  are  restrictions  upon  the  types  of  data  that  they  can  supply  and  the  confidence 
with  which  the  data  can  be  interpreted.  In  the  first  place,  introspective  judgments  of 
performance  (such  as  asking  subjects  which  experimental  condition  they  believed  that  they 
performed  the  best  in)  should  not  be  used  as  a  replacement  for  empirical  measures  of  task 
performance,  as  perceptions  of  global  task  performance  are  likely  to  be  subject  to  memory 
limitations.  For  example,  overall  performance  may  be  high,  but  if  mistakes  are  more  frequent 
in  the  last  half  of  the  test  the  subject  may  perceive  global  performance  as  being  poor. 
Additionally,  if  subjects  do  not  receive  feedback  as  individual  tasks  are  completed,  rating  of 
performance  might  reflect  task  difficulty  rather  than  success  rate. 

Secondly,  data  about  subject  preference  (for  example,  "do  you  prefer  display  A  or  display 
B?")  should  be  interpreted  with  caution,  for  as  Andre  and  Wickens  (1995  cited  in  St.  John  et 
al.,  2000b)  note,  people  do  not  always  want  what  is  good  for  them.  For  example,  in  an 
experiment  comparing  the  performance  of  subjects  on  a  visual  identification  task  using  either 
2-D  or  3-D  displays  Baumann,  Blanksteen  and  Dennehy  (1997)  found  that  subjects  indicated  a 
preference  for  the  3-D  displays,  even  if  their  performance  was  significantly  better  using  the  2- 
D  display  (Baumann,  Blanksteen,  &  Dennehy,  1997).  Using  subject  preference  ratings  to 
determine  display  design  should  probably  be  avoided,  as  it  is  difficult  to  determine  if  the 
users  are  stating  a  preference  in  terms  of  aesthetics,  utility,  or  some  other  criterion. 

Thirdly,  the  use  of  subjective  ratings  raises  the  issue  of  reliability.  Because  subjective  ratings 
are  indications  of  individual  opinion,  there  is  no  way  of  ensuring  that  the  participants  are 
applying  the  same  standards  to  the  problem.  One  subject's  opinion  of  what  constitutes  a  good 
performance  might  be  completely  different  to  another  subject's.  Different  subjects  might  be 
applying  different  criteria,  such  as  perceived  difficulty  or  perceived  percentage  of  trials 
correct,  to  assess  performance.  Additionally,  there  may  be  variation  between  subjects  in  terms 
of  how  they  weight  their  assessments:  "good"  for  one  subject  may  be  "excellent"  or  "average" 
for  another.  Finally,  subjective  rating  is  open  to  assessment  variation  both  within  and  across 
trials. 

Bearing  these  problems  in  mind,  it  is  important  to  note  that  subjective  rating  can  provide  valid 
and  replicable  data,  provided  that  they  are  used  in  the  right  situations.  One  type  of  subjective 
measure  that  has  proved  to  be  of  interest  in  a  number  of  areas  is  subject  self-rating  of 
confidence,  and  this  will  be  explained  in  more  detail  below. 
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Empirical  data  are  obtained  from  the  observation  of  subject  behaviour  under  experimental 
conditions.  The  advantage  of  empirical  testing  is  that  it  is  more  objective  and  can  often  be 
gathered  unobtrusively.  Consequently,  empirical  data  can  be  considered  to  be  a  more  accurate 
indication  of  performance  than  data  that  are  gathered  using  subjective  measures. 
Additionally,  empirical  tests  are  able  to  assess  performance  in  terms  of  fixed  criteria  that  can 
be  applied  across  all  test  subjects  and  experimental  conditions,  thereby  allowing  meaningful 
comparisons  to  be  made  between  both  groups  and  individuals,  across  trials  and  across  time. 


5.2  Performance  measures 

A  number  of  different  performance  measures  are  available  for  assessing  display  utility. 
Perhaps  the  most  obvious  measure  of  performance  is  percentage  of  correct  responses,  usually 
referred  to  as  accuracy.  Accuracy  scores  can  be  obtained  for  individual  sub-tests,  which  can  in 
turn  be  combined  to  provide  a  measure  of  global  performance.  Endsley  (1995a)  cautions 
against  the  use  of  global  measures  when  assessing  situation  awareness,  as  an  overall 
performance  score  cannot  provide  information  about  the  various  areas  of  strength  or 
weakness.  For  example,  performance  may  be  high  in  all  areas  but  one,  resulting  in  an  overall 
high  score  but  providing  no  indication  of  the  area  of  weakness. 

Chronometric  measures  such  as  response  time  and  completion  speed  have  been  widely  used 
in  conjunction  with  accuracy  scores  as  a  means  of  assessing  display  utility.  It  is  generally 
recommended  that  researchers  collect  both  chronometric  and  accuracy  data,  as  this  provides  a 
more  complete  view  of  task  performance  than  either  measure  alone.  For  example, 
performance  may  be  more  accurate  in  one  experimental  condition,  but  the  subjects  may  take 
longer  to  complete  the  task.  This  information  could  be  of  vital  importance  when  one  considers 
that  in  practical  situations  display  operators  are  often  required  to  make  accurate  decisions  in 
limited  time.  The  advantage  provided  by  a  particular  display  configuration  may  not  be  worth 
the  amount  of  time  needed  to  make  the  decision. 

In  some  studies  response  time  has  been  used  as  the  sole  measure  of  task  performance.  This 
can  be  useful  in  situations  for  which  an  accuracy  measure  is  inappropriate.  For  example,  in  a 
study  by  Bauman  et  al  (1997)  subjects  were  required  to  identify  descending  aircraft  using 
either  a  2-D  or  3-D  display  (Baumann  et  al.,  1997).  Of  prime  importance  in  this  instance  was 
how  long  the  subjects  took  to  access  the  data,  not  how  accurate  they  were  because  information 
regarding  platform  heading  was  explicitly  represented  in  both  displays,  so  subject  accuracy 
was  expected  to  be  high.  Another  example  is  a  study  by  St.  John  and  colleagues  in  which 
subjects  were  required  to  create  a  chain  of  antennae  towers  that  were  within  line-of-sight  of 
each  other  but  hidden  from  enemy  towers  (St.  John  et  al.,  2001b).  During  the  course  of  the  task 
the  subjects  were  advised  as  to  whether  the  placement  of  each  tower  contravened  task 
requirements.  As  a  consequence  of  this,  no  meaningful  accuracy  score  could  be  collected,  but 
the  display  conditions  could  be  differentiated  in  terms  of  speed  of  task  completion. 

Subjective  confidence  judgments  are  a  less  common  form  of  performance  assessment  than 
either  accuracy  or  response  time  measures.  However,  they  can  provide  important  insight  into 
subject  responses.  For  example,  the  results  of  a  study  by  Wickens  et  al  (2000)  indicated  that 
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subjects  responded  with  equal  confidence  regardless  of  experimental  condition,  despite  the 
fact  that  there  were  significant  differences  in  accuracy  between  each  condition  (Wickens  et  al., 
2000).  Confidence  is  usually  measured  by  asking  subjects  to  indicate  how  confident  they  were 
of  a  response  by  marking  a  point  on  a  scale,  ranging  from  low  or  no  confidence  to  high 
confidence.  To  ensure  that  the  subjects  give  equal  consideration  to  the  full  range  of  choices, 
the  number  of  items  on  the  scale  should  probably  not  exceed  the  limits  of  STM  capacity  (7  +/  - 
2  items),  and  any  fewer  than  5  items  will  not  provide  the  subjects  with  a  large  enough  range  of 
choices.  Additionally,  designers  should  avoid  using  an  even  number  of  items,  as  they  do  not 
allow  the  subjects  to  mark  a  mid-point. 

On  the  other  hand  Vickers,  (1979)  and  Vickers  &  Lee  (1998)  suggest  that  there  is  well- 
replicated  evidence  that  accuracy,  response  time  and  confidence  levels  fluctuate  in  meaningful 
and  predictable  ways.  Generally  speaking,  when  subjects  are  required  to  respond  as 
accurately  as  possible  (as  opposed  to  responding  as  quickly  as  possible),  accuracy  and 
confidence  tend  to  be  positively  correlated,  and  accuracy  and  response  time,  and  response 
time  and  confidence  tend  to  be  negatively  correlated.  Additionally,  response  times  tend  to  be 
longer  and  confidence  scores  lower  for  incorrect  responses  in  comparison  to  correct  responses. 
Knowledge  of  this  performance  measure  interaction  can  be  useful  when  interpreting  data, 
particularly  in  regards  to  response  bias  and  response  utility.  For  example,  in  a  2  choice 
situation,  if  the  likelihood  of  making  a  correct  decision  is  greater  for  one  response  than  the 
other,  subjects  will  make  that  choice  more  often  and  their  corresponding  confidence  responses 
will  be  high.  Alternatively,  if  the  benefit  of  making  one  response  is  greater  than  the  other, 
subjects  will  be  biased  toward  making  that  response  regardless  of  the  likelihood  of  that 
response  occurring.  Response  times  and  accuracy  scores  may  be  the  same  as  in  the  previous 
instance,  but  confidence  scores  can  be  expected  to  be  lower. 

5.3  Assessing  Display  Utility:  Summary 

This  section  outlined  a  number  of  factors  relevant  to  the  assessment  of  display  utility: 

•  Subjective  and  Empirical  data  were  described  and  their  comparative  appropriateness  was 
evaluated 

•  Various  performance  measures  and  their  uses  were  described 


39 


DSTO-TR-1766 


6.  Future  Directions  for  Research  in  Submarine  Display 

Design 

While  reviewing  the  literature  on  visual  display  design  it  became  evident  that  there  was  a 
need  for  research  to  be  conducted  in  three  specific  areas.  The  following  section  outlines  these 
recommended  future  directions,  and  provides  preliminary  recommendations  of  how  research 
in  these  areas  might  proceed.  Research  in  these  areas  has  the  potential  uncover  important 
novel  theoretical  developments  and  new  technologies. 

6.1  Visualising  Uncertainty 

As  has  been  mentioned,  there  is  a  pressing  need  to  develop  ways  of  visualising  uncertainty. 
While  this  has  been  recognised  in  the  literature  (St.  John  et  al.,  2000a),  there  are,  as  yet,  few 
empirical  investigations  of  the  problem.  The  fundamental  question  is,  of  course,  how 
uncertainty  should  be  visualised  formally.  A  number  of  different  methods  have  been 
proposed,  including  numerical  representations,  ellipses,  colour  coding,  and  blurring. 
However,  there  have  been  no  investigations  of  what  types  of  visualisation  methods  are 
suitable  for  particular  situations.  Additionally,  it  may  be  that  particular  visualisation  methods 
are  only  suitable  for  representing  specific  types  of  uncertainty.  For  example,  ellipses  may  be 
best  suited  to  representing  positional  uncertainty,  but  not  type  uncertainty. 

The  problem  is  further  complicated  when  considering  that  multiple  uncertainties  may 
need  to  be  represented  simultaneously.  For  example,  in  sub-surface  environments  it  is  not 
uncommon  for  there  to  be  a  degree  of  uncertainty  associated  with  the  type,  position,  heading 
and  velocity  of  a  target  platform.  Simultaneously  representing  multiple  types  of  uncertainty 
poses  a  number  of  problems.  First,  given  that  there  may  be  a  restricted  number  of  plausible 
ways  of  representing  uncertainty,  there  may  be  restrictions  upon  the  number  of  dimensions 
that  can  be  represented  concurrently.  Furthermore,  there  has  been  no  research  into  how 
different  combinations  of  simultaneously  represented  uncertainty  visualisations  might 
interact. 

An  additional  problem  that  needs  to  be  addressed  is  how  particular  distributions  of 
uncertainty  should  be  represented.  For  example,  positional  uncertainty  is  often  represented  by 
an  ellipse.  If  the  uncertainty  data  are  drawn  from  Gaussian  distributions  then  the  centre  of  the 
ellipse  represents  the  point  of  greatest  certainty  in  the  position  of  the  target  platform,  with 
certainty  decreasing  monotonically  towards  the  edge  of  the  ellipse.  However,  if  the 
distribution  is  actually  bi-  or  multi-modal,  then  such  a  representation  poses  problems. 
Representing  bi-modal  distributions  with  ellipsoid  may  create  the  impression  that  the  point  of 
greatest  certainty  is  central  when  this  is  not  the  case. 

Ideally,  what  is  required  is  the  capability  to  display  uncertainty  about  both  physical  and 
abstract  variables  simultaneously,  in  a  way  that  is  accurately  and  effortlessly  comprehended 
by  the  display  user.  For  example,  consider  a  scenario  where  sensor  data  implies  a  multi-modal 
probability  distribution  for  the  location  of  a  sub-surface  object,  and  gives  a  set  of  classificatory 


40 


DSTO-TR-1766 


probabilities  for  what  the  object  is.  Developing  psychologically  principled  techniques  for 
representing  this  uncertain  information  is  an  important  and  challenging  problem.  Its  solution 
would  significantly  enhance  the  utility  of  data  visualisation  systems  in  the  submarine 
environment. 

An  obvious  method  of  testing  the  comparative  utility  of  rival  uncertainty  representations 
would  be  to  measure  the  performance  of  subjects  on  real-world  tasks  using  the  various 
uncertainty  representations.  Subjects  could  be  randomly  assigned  to  either  a  control  group  or 
a  treatment  group.  Both  groups  would  perform  the  same  task,  for  example  they  may  be 
required  to  locate  target  platforms  and  identify  the  platform's  type,  speed  and  heading.  In  the 
control  treatment  uncertainty  could  be  represented  by  numerical  data,  whereas  in  the 
experimental  treatment  uncertainty  might  be  explicitly  represented.  In  addition  to  task 
performance,  it  might  be  appropriate  to  measure  response  time  and  accuracy.  Comparisons 
could  then  be  made  between  the  two  conditions. 


6.2  Decision  Support  Systems 

Real  world  human  decision  making  involves  large  amounts  of  incomplete,  uncertain  and 
unreliable  information,  and  is  set  in  a  complex  and  richly  structured  environment  that  is 
changing  constantly.  Set  against  these  challenges,  it  is  not  surprising  that  most  of  the  real 
world  decisions  made  by  people  resist  complete  formal  characterisation,  and  so  are  difficult,  if 
not  impossible,  to  automate  fully  in  artificial  systems.  Nevertheless,  it  is  useful,  where 
possible,  to  automate  some  of  the  more  routine  parts  of  human  decision  making  that  are  well 
understood.  This  reduces  the  cognitive  load  on  users,  and  allows  their  limited  decision 
making  resources  to  be  focussed  on  the  most  difficult  problems.  The  remarkable  abilities  of 
human  visual  and  cognitive  processing  are  best  used  to  detect  patterns,  draw  inferences,  make 
generalisations,  develop  and  test  hypotheses,  generate  queries,  and  work  towards  conclusions 
that  are  not  easily  automated. 

Decision  support  systems  attempt  to  achieve  this  balance  by  using  decision  making  models 
for  simple  tasks,  or  to  suggest  possible  answers,  requiring  human  confirmation,  for  more 
difficult  tasks.  In  this  way,  decision  support  systems  become  tools  that  extend  human  decision 
making  capabilities,  providing  not  only  an  interface  to  the  information  needed  for  better 
decisions,  but  also  bearing  some  of  the  decision  making  load. 

A  considerable  body  of  applied  and  theoretical  research  in  psychology,  cognitive  science, 
computer  science,  and  other  fields  has  been  devoted  to  developing  and  evaluating  decision 
support  systems.  A  number  of  automated  decision  aids  are  currently  being  developed  for 
military  applications  (Liebhaber  &  Feher,  2002).  The  most  significant  challenge,  and  the  area 
with  the  greatest  potential  for  a  "game  changing"  advance,  is  in  developing  suitable  cognitive 
models  of  human  decision  making.  One  possible  research  direction  is  to  use  the  ecologically 
rational  or  'Fast  and  Frugal'  decision  making  heuristics  outlined  by  Todd  &  Gigerenzer  (1999). 
There  is  considerable  empirical  evidence  to  suggest  that  human  decision  making  closely 
resembles  Gigerenzer's  ecologically  rational  model,  and  unlike  alternative  non-classically 
rational  models  such  as  Klein's  (Klein,  1999)  'Naturalistic'  decision  making  model,  'Fast  and 
Frugal'  heuristics  are  easily  operationalised  as  decision  making  algorithms.  Although  no 


41 


DSTO-TR-1766 


single  algorithm  may  be  able  to  deliver  an  optimal  solution  to  a  problem,  an  algorithm  based 
upon  a  model  of  human  decision  making  may  be  able  to  produce  human-like  responses, 
thereby  considerably  lessening  the  cognitive  load  of  the  users. 

One  way  of  testing  the  effectiveness  of  decision-making  heuristics  would  be  to  compare 
the  performance  of  various  decision-making  models  with  the  performance  of  human  subjects. 
In  order  to  do  this  it  would  be  necessary  to  gather  information  about  the  various  cues  that 
experienced  individuals  use  to  perform  real-world  tasks.  For  example,  it  should  be  possible  to 
gather  a  list  of  all  of  the  cues  used  in  Target  Motion  Analysis.  Expert  users  would  then  rate 
these  cues  in  terms  of  their  effectiveness  or  salience.  A  number  of  rival  models  could  be 
developed,  for  example,  'Fast  and  Frugal  models'  using  minimal  cues,  and  'Unbounded'  or 
traditionally  rational  models  (such  as  multiple  regression)  using  all  of  the  cues. 

Of  course  the  human  capacity  for  decision  making  need  not  necessarily  bound  the  decision 
support  technology  or  processes.  The  most  effective  decision  making  model  is  that  model  that 
makes  the  best  the  most  effective  decisions  whether  framed  by  human  performance  or  purely 
machine  generated  or  some  combination  of  both. 


6.3  Exploiting  the  Human  Facility  to  Interpret  Affective  Data 

Affect  is  a  general  term  that  is  used  to  encompass  a  range  of  phenomena  related  to  emotion 
and  mood.  It  is  also  used  to  refer  to  ideas  such  as  feelings,  mental  state,  unease  and  trust.  The 
expression  and  recognition  of  affect  is  crucial  for  the  communication  of  understanding.  If 
expressive  components  of  affect  play  an  essential  role  in  the  communication  of  internal  states 
and  the  promotion  of  natural  interactions,  endowing  our  computer  systems  with  expressive 
skills  may  enable  us  to  capitalise  on  the  inherent  human  facilities  that  support  social 
interaction. 

Because  affective  information  is  readily  interpreted,  the  potential  exists  for  using  affective 
information  to  convey  uncertainty  data.  For  example,  a  decision  support  system  may  be  able 
to  provide  an  interface  user  with  a  number  of  potential  action  plans.  Each  of  these  plans  may 
have  an  associated  degree  of  uncertainty.  Avatars  (i.e.,  human  like  computer  representations) 
could  be  used  to  convey  the  various  plans,  thereby  simultaneously  imparting  affective 
information  to  the  interface  user  that  can  be  interpreted  in  terms  of  uncertainty.  In  other 
words,  avatars  of  an  untrustworthy  appearance  might  convey  high-risk  plans,  and  honest  or 
reliable  looking  avatars  could  convey  information  that  has  a  high  degree  of  associated 
confidence. 

More  generally,  there  is  an  enormous  untapped  potential  for  using  affective  information, 
like  emotions,  to  improve  data  visualisation  systems.  Human  decision  making  evolved  in  an 
environment  where  both  affective  information  and  abstract  conceptual  information  were 
important.  This  suggests  that  information  might  be  communicated  to  people  in  both  ways, 
conveying  some  data,  like  trustworthiness  of  reliability,  that  are  well  suited  to  affective 
representation  through  emotions,  and  conveying  other  data,  like  spatial  position,  that  are 
more  abstract  and  conceptual  in  nature  through  conventional  means.  The  potential  benefit  of 
adopting  this  approach  is  two-fold.  First,  it  is  likely  a  greater  volume  of  data  could  be 
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conveyed  more  rapidly,  because  a  greater  array  of  the  possibilities  for  communicating 
information  to  people  is  being  used.  Metaphorically,  using  a  new  protocol  has  increased  the 
bandwith  of  the  communication  channel  between  the  data  and  the  human.  Secondly,  some 
aspects  of  the  data,  like  reliability  measures,  are  better  communicated  in  an  affective  form. 
Human  decision  making  will  be  enhanced  because  the  users  can  'feel'  that  some  information 
is  unreliable,  rather  than  have  to  integrate  this  knowledge  as  a  separate  abstract  conceptual 
fact.  Metaphorically,  the  new  affective  protocol  provides  a  better  encoding  of  some  aspects  of 
the  data,  and  so  allows  these  data  to  be  accurately  and  effortlessly  understood. 

Testing  the  usefulness  of  affective  data  as  a  decision  support  aid  could  be  achieved  using  a 
similar  methodology  as  could  be  used  for  testing  uncertainty  representations  (see  above). 
Subjects  could  be  randomly  assigned  to  either  a  treatment  or  control  condition,  and  their 
performance  on  a  simulated  real-world  task  could  be  measured.  Subjects  in  the  control 
condition  might  be  asked  to  perform  a  task  with  conventional,  non-affective  decision  support, 
whereas  subjects  in  the  experimental  condition  would  be  provided  with  decision-making 
support  with  an  affective  component,  perhaps  in  the  form  of  an  avatar  that  would  offer  advice 
on  key  decisions.  The  uncertainty  associated  with  the  avatar's  advice  could  be  represented  by 
manipulating  the  facial  features  or  voice  of  the  avatar.  In  other  words,  uncertain  information 
would  be  conveyed  by  avatars  with  an  untrustworthy  appearance,  and  certain  information 
conveyed  by  trustworthy-looking  avatars.  If  affective  decision  support  is  a  useful  device  then 
the  performance  of  subjects  in  the  experimental  condition  should  be  better  than  the 
performance  of  subjects  in  the  control  condition. 
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6.4  Research  Directions:  Summary 

This  section  outlined  three  areas  that  have  been  recommended  as  areas  for  future  research: 

•  Visualising  uncertainty 

•  Decision  support  system 

•  Affective  data 

•  Examples  were  provided  detailing  how  these  areas  might  be  empirically  researched. 
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Appendix  A:  The  Generative  Transformational 
Approach  to  Visual  Perception. 

A.l.  Theories  of  visual  perception 

So  far  as  the  perceptual  aspect  of  displays  is  concerned,  it  is  convenient  to  distinguish  three 
main  approaches.  These  focus  on  successive  stages  of  the  processing  of  visual  information  by 
considering  (1)  the  geometry  of  projected  images,  (2)  the  responses  by  specific  cells  and 
channels,  and  (3)  the  contribution  by  cognitive  processes. 

Geometrical  approaches  to  both  human  and  machine  vision  (e.g..  Van  Gool  et  al.,  1994;  Mundy 
&  Zisserman,  1992)  generally  focus  on  analysing  invariant  properties  of  a  projected  image  that 
result  from  various  interactions  between  a  light  source,  an  object  or  scene  and  an  observer, 
and  provide  information  for  the  perception  of  depth  and  three-dimensional  layout  in 
ecologically  representative  situations.  These  properties  include  invariants  associated  with  the 
perception  of  shape,  slant,  biological  motion,  the  direction  of  self-movement  and  imminent 
collision  (e.g.,  Wagemans  et  al.,  2000;  Cutting  &  Readinger,  2002). 

Neurophysiological  theorists  concentrate  on  explicating  the  relations  between  the  visual  image 
and  the  response  of  specific  cells  and  on  trying  to  reconstruct  an  organised  'perception'  from 
their  interactions  (Bullier,  2002).  A  related,  but  somewhat  different,  approach  has  been 
pursued  by  researchers  who  believe  that  complex  perceptions  can  be  assembled  from  the 
responses  of  filter  mechanisms  that  are  sensitive  to  extended  visual  patterns,  characterised  by 
different  spatial  frequencies  (Westheimer,  2001). 

Cognitive  theories  emphasise  the  contribution  of  'higher-level'  brain  processes  that,  it  is 
claimed,  are  necessary  to  interpret  the  underdetermined  responses  by  the  visual  system  to  the 
information  presented  by  an  ambiguous  image.  Cognitive  theorists  look  for  explanations  of 
perceptual  achievements  in  terms  of  general  optimising  principles,  such  as  maximising 
symmetry,  simplicity  or  likelihood,  (e.g.,  Dodwell,  1992;  Chater  &  Vitanyi,  2003;  Feldman, 
2000). 

A.2.  A  generative  transformational  theory 

Recently,  Vickers  (2001;  2002)  has  developed  a  generative  transformational  (GT)  theory  that 
attempts  to  bring  together  these  different  approaches.  According  to  this  approach,  the  visual 
system  uses  information  about  the  relations  between  the  positions  of  elements  composing  an 
image  to  select  geometrical  transformations  that,  when  applied  to  these  elements,  generate  a 
representation  that  maximises  the  degree  of  self-similarity  within  the  representation  and 
minimises  the  difference  between  the  representation  and  the  current  retinal  input.  In  other 
words,  the  visual  system  is  thought  to  construct  the  most  parsimonious  description  of  the 
image  elements,  consistent  with  the  original  image.  (Here,  the  term  'self-similarity'  is  used  in  a 
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generalised  sense  to  mean  a  statistical  resemblance  of  any  one  part  of  the  representation  to 
any  other  part,  including  the  whole  representation.) 

The  idea  motivating  this  approach  is  that  of  fractal  compression,  in  which  the  information  in 
an  image  is  encoded  in  the  parameters  of  a  collage  of  transformations  that,  when  applied 
recursively  to  a  small  set  of  image  elements,  are  capable  of  generating  a  close  statistical 
approximation  to  the  original  image  (Barnsley  &  Hurd,  1993).  However,  the  process  by  which 
this  compression  is  achieved  in  the  GT  approach  is  biologically  inspired,  rather  than  based  on 
computational  techniques  developed  in  the  field  of  machine  vision.  In  particular,  the  selection 
of  appropriate  transformations  is  assumed  to  be  guided  by  relational  information  about  the 
positions  of  stimulus  elements.  The  pivotal  assumption  here  is  that  processing  by  the  visual 
system  depends  on  relations  between  nearest  (nearest,  second-nearest,  ...,  k- nearest) 
neighbours,  rather  than  on  absolute  distances. 

A  second  important  aspect  of  this  approach  concerns  the  interpretation  of  the  hypothesised 
transformations.  An  underlying  assumption  is  that  these  transformations  have  evolved  from 
the  internal  direction  of  overt  actions  (cf  Shepard,  2001a;  2001b;  Vickers,  2001).  This 
assumption  means  that  the  range  of  permissible  transformations  is  limited  to  those  that  can  be 
physically  realised.  It  suggests  that  perceived  symmetries  are  likely  to  facilitate  the  translation 
of  seen  structure  into  overt  action.  This  assumption  also  provides  an  interpretation  of  the 
otherwise  nebulous  concept  of  'perceived  organisation'.  According  to  this  view,  perceived 
organisation  corresponds  to  the  trajectories  of  those  transformations  that  maximise  the  degree 
of  generalised  self-similarity  in  a  representation  and  minimise  the  match  between  the 
generated  representation  and  the  presented  image.  That  is,  what  we  experience  as  a  (non- 
visible)  'link'  between  one  set  of  image  elements  and  another  corresponds  to  the  minimum 
path  that  a  physical  action  would  follow  in  bringing  the  first  set  of  elements  into  the  closest 
correspondence  with  the  second  set. 

A.3.  General  properties  of  the  generative  transformational  approach  as  a 
theory  of  visual  perception 

The  applicability  of  the  GT  approach  to  the  perceptual  aspects  of  display  design  ultimately 
depends  upon  the  adequacy  of  the  GT  approach  as  a  theory  of  visual  perception.  As  indicated 
earlier,  a  summary  evaluation  of  the  GT  theory  is  naturally  concerned  with  the  dependence  of 
the  theory  on  information  about  relative  position  and  the  importance  in  the  theory  of 
invariance  or  symmetry  under  transformation. 


A.3.1  The  importance  of  nearest  neighbour  relations  for  visual  perception. 

Despite  the  advantages  of  NNA,  there  is  little  explicit  recognition  in  studies  of  visual 
perception  that  the  statistical  properties  of  spatial  point  processes  have  been  extensively 
analysed  in  a  wide  variety  of  other  contexts.  These  include  ecology,  geography  and 
geophysics,  as  well  as  geometric  probability  and  spatial  statistics  (e.g.,  Cressie,  1993;  Upton  & 
Fingleton,  1985). 
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Relation  to  neurophysiological  findings. 

A  possible  reason  for  the  neglect  of  NNs  may  be  that  the  implications  of  NNA  are  difficult  to 
reconcile  with  neurophysiological  approaches  based  solely  on  classical  receptive  field  (CRF) 
architectures.  If  CRFs  correspond  to  structures  defined  in  absolute  spatial  terms,  and  are 
independent,  they  should  be  associated  with  similar  responses  whenever  those  structures  are 
present  and  independently  of  contextual  information.  In  contrast,  because  NNA  is  based  on 
relations  among  array  elements,  NNA  predicts  that  responses  should  depend  on  relative, 
rather  than  absolute  distances,  and  should  be  influenced  by  the  distribution  and  density  of  all 
array  elements. 

Nevertheless,  there  are  several  recent  neurophysiological  findings  that  are  consistent  with  the 
implications  of  NNA.  For  example,  Motter  &  Belky  (1998)  found  that  target  detection  occurred 
only  within  a  restricted  area,  determined  by  element  density,  and  concluded  that  attention 
operates  within  an  area  having  a  radius  of  twice  the  average  NN  distance.  Meanwhile,  Brady 
et  al.  (1997)  and  Bex  &  Dakin  (2003)  showed  that  the  visual  system  is  sensitive  to  motion 
information  at  widely  separate  spatial  frequencies  and  over  a  range  of  spatial  scales.  Similarly, 
Dakin  &  Herbert  (1998)  found  that  the  spatial  integration  region  for  the  perception  of 
reflective  symmetry  varied  inversely  with  peak  spatial  frequency.  Such  scale-invariant 
properties  correspond  well  with  recent  evidence  that  element  density  is  the  property  that  the 
visual  system  uses  to  implement  scale  invariance  in  the  perception  of  symmetry  and  Glass 
pattern  structure  (e.g.,  Rainville  &  Kingdom,  2002).  Both  scale-invariance  and  the 
determination  of  perception  by  element  density  are  central  implications  of  NNA. 

Relation  to  psychological  findings. 

The  importance  of  NN  relations  follows  from  the  fact  that  NNs  are  concerned  with  relative 
distances  and,  in  consequence,  that  NN  distributions  have  certain  characteristics.  In 
consequence,  an  NNA-based  approach  can  provide  a  resolution  and  explanation  for  certain 
problems  and  findings,  common  to  the  perception  of  structure,  motion  and  depth. 

The  correspondence  problem. 

For  example,  the  use  of  well-populated  random  dot  arrays  in  experiments  on  visual 
perception  has  highlighted  a  general,  so-called  correspondence  problem,  common  to  the 
perception  of  structure,  motion  and  depth.  If  a  single  element  is  displaced,  from  one  frame  to 
the  next,  by  regular  increments  in  a  specific  direction,  within  a  background  of  elements  that 
are  distributed  randomly  and  independently  in  successive  frames,  then  it  is  possible  to  detect 
the  constrained  motion  of  the  single  element.  However,  because  each  element  in  one  frame 
could  logically  be  paired  with  any  element  in  the  succeeding  frame,  there  are  n\  possible 
correspondences  for  the  visual  system  to  consider.  A  similar  computational  problem  arises 
with  the  detection  of  structure  in  Glass  patterns  (Glass,  1969)  and  with  the  matching  of 
stimulus  elements  in  the  two  halves  of  a  pair  of  random  dot  stereograms  (Palmer,  1999). 

NNA  deals  with  sets  of  (least)  distances  between  pairs  of  points,  the  members  of  which  can 
belong  to  a  single  array  or  to  two  spatially  or  temporally  distinct  arrays.  Despite  its  simplicity, 
this  property  has  important  consequences.  For  example,  NNA  does  not  consider  interactions 
among  three  or  more  sets  of  points.  This  agrees  with  evidence  that  the  visual  system  does  not 
require  such  information  and  does  not  process  it  (e.g.,  Todd,  1994;  Williams  &  Sekuler,  1984). 
A  second  consequence  is  that  the  same  processing  can  be  applied  to  a  single  array,  or  two 
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successive  or  spatially  distinct  arrays.  This  means  that,  in  principle,  the  perception  of 
structure,  motion  and  depth  can  all  be  explained  by  a  single  type  of  processing  mechanism, 
and  that  similar  information  can  give  rise  to  any  -  or  all  -  of  these  perceptions.  A  third 
consequence  (most  directly  relevant  to  the  correspondence  problem)  is  that  attention  is 
restricted  to  a  subset  of  all  possible  inter-element  distances.  Because  each  element  has  one  NN, 
only  n  distances  need  to  be  considered  -  a  significant  reduction  in  computation. 

Sensitivity  to  element  density. 

Experiments  employing  dot  patterns  have  also  produced  the  ubiquitous  empirical  finding  that 
the  perception  of  structure,  motion  and  depth  are  all  influenced  by  the  density  of  pattern 
elements.  Although  intuitively  unremarkable,  this  commonplace  result  represents  something 
of  a  challenge  for  theory. 

A  second  characteristic  of  NNA  allows  it  to  account  for  this  finding  and  to  differentiate 
between  two  major  classes  of  explanation:  one  in  terms  of  scale-bound  mechanisms  that 
respond  on  the  basis  of  absolute  size  or  a  fixed  spatial  frequency;  and  a  (less  popular) 
alternative,  based  on  relative  distance.  A  critical  instrument  in  this  differentiation  is  the  fact 
that  the  distribution  of  all  inter-element  distances  in  a  random  array  has  very  different 
properties  from  that  of  NN  distances. 

As  shown  by  Diggle  (1983),  the  distribution  of  all  inter-element  distances  for  a  random  pattern 
is  approximately  normal.  Importantly,  the  probability  distribution  function  has  no  density 
parameter,  so  that  the  mean  and  all  characteristics  of  the  distribution  remain  the  same,  no 
matter  how  many  elements  there  are.  This  means,  for  example,  that  any  perceptual  process 
that  is  potentially  influenced  by  all  inter-element  distances  (such  as  an  array  of  overlapping 
motion  detectors,  each  tuned  to  motion  over  a  different  absolute  distance),  should  respond  in 
much  the  same  way,  irrespective  of  element  density. 

In  contrast,  NN  distances  conform  to  a  distribution,  of  which  the  mean  and  variance  are 
specified  by  element  density.  In  contrast  to  a  scale-bound  mechanism  based  on  absolute 
distance,  such  a  representation  of  stimulus  information  appears  to  be  appropriately  sensitive 
to  variations  in  element  density. 


Sensitivity  to  clustering  and  symmetries. 

Within  the  GT  approach,  differences  between  the  distribution  of  NN  distances  and  that  of  all 
inter-element  distances  provide  a  way  of  detecting  the  presence  of  constraint  in  an  array  of 
elements  through  a  process  of  signal/ noise  differentiation,  implemented  in  a  sequential 
sampling  decision  mechanism  of  the  kind  described  by  Vickers  and  Lee  (1998;  2000). 

For  example,  on  an  approach  based  on  CRF  architecture,  a  possible  explanation  for  seeing 
particular  elements  as  clustered  together,  and  for  the  number  of  clusters  seen,  is  that  all  dots 
within  a  fixed  distance  from  each  other  are  seen  as  linked.  This  predicts  that  the  number  of 
clusters  detected  should  increase  as  a  function  of  the  number  of  inter-element  distances,  n(n- 
1),  when  the  number  of  elements,  n,  is  increased.  Conversely,  the  number  of  clusters  detected 
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should  halve  when  the  scaling  of  an  element  array  is  doubled.  An  alternative  explanation  is 
that  perceived  links  correspond  to  NN  links,  and  hence  depend  on  relative  distance.  This 
predicts  that  the  number  of  clusters  detected  should  increase  as  a  linear  function  of  the 
number  of  NN  distances,  n,  when  the  number  of  dots  is  increased.  However,  the  number  of 
clusters  detected  should  remain  the  same  when  the  scaling  of  an  element  array  is  doubled. 

In  agreement  with  the  GT  approach,  Vickers,  Preiss  and  Hughes  (submitted)  found  that  both 
the  number  of  links  and  the  number  of  clusters  detected  by  subjects  in  random  element  arrays 
increased,  as  predicted,  as  a  linear  function  of  the  number  of  elements  in  the  array. 

Structure  in  a  visual  array  can  also  result  from  uniform  transformations  applied  to  all 
elements  of  the  array.  For  example.  Glass  patterns  are  produced  by  superimposing  a 
transformed  copy  of  an  array  of  randomly  positioned  elements  (usually  dots)  on  the  original 
array.  Marroquin  patterns  (Marroquin,  1976;  see  also  Earle,  1991)  are  produced  by 
superimposing  a  transformed  copy  of  a  regular  array  (such  as  a  lattice  of  dots)  on  the  original. 

A  second  possible  reason  that  no  systematic  programme  of  research  into  the  role  of  NNs  has 
been  undertaken  is  that  several  researchers  (e.g.,  Dakin,  1997;  Maloney,  Mitchison,  &  Barlow, 
1987)  have  argued  (mistakenly)  that  the  perception  of  structure,  despite  the  presence  of  non¬ 
corresponding  NNs,  shows  that  an  NN  mechanism  cannot  account  for  Glass  pattern 
perception.  However,  the  distribution  of  NN  distances  for  corresponding  elements  has  a 
different  mean  and  variance  to  that  for  non-corresponding  dots,  and  cumulative  statistics 
have  been  developed  (Vickers,  2002),  that  make  possible  the  reliable  differentiation  of 
sampled  distances  as  arising  from  one  or  the  other  distribution,  even  when  several  non¬ 
corresponding  distances  intervene.  As  a  result,  the  GT  approach  is  capable  of  accounting,  at  a 
qualitative  level  at  least,  for  the  observed  effects  of  variations  in  noise,  in  element  density,  and 
in  the  step  size  of  the  transformations  used  to  produce  Glass  patterns.  Meanwhile,  in  the  case 
of  Marroquin  patterns,  periodic  structure  is  revealed  in  the  distributions  of  kth- order  NN 
distances  as  ascending  orders  of  k  are  considered. 

Finally,  an  important  class  of  pattern,  generated  by  the  same  type  of  transformation  process, 
has  been  investigated  independently  under  the  label  of  'mirror'  or  'reflective'  symmetry.  As 
recently  shown  by  Rainville  &  Kingdom  (2002),  the  region  of  maximal  sensitivity  to  reflective 
symmetry  varies  with  element  density,  as  predicted  by  an  NN  approach. 


Effects  of  boundary  shape. 

A  further,  general  factor  that  has  been  little  studied,  but  that  has  been  shown  to  affect 
performance,  is  the  shape  of  the  boundary  enclosing  a  visual  display  (Dakin  &  Bex,  2002). 

If  the  perception  of  structure  is  thought  of  as  a  process  of  signal/ noise  differentiation,  as 
proposed  by  the  GT  approach,  then  it  would  be  predicted  that  boundary  shape  would  have 
some  effect  on  perception.  The  distribution  of  all  inter-element  distances  is  influenced  by 
whether  the  elements  are  enclosed  in  a  regular  polygonal,  a  circular,  or  a  rectangular  area 
(Sheng,  1985).  In  the  case  of  a  polygonal  boundary,  the  mean  of  the  distribution  depends  on 
the  number  of  sides,  and,  in  the  case  of  a  rectangular  boundary,  the  mean  depends  on  the 
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height/width  ratio  (Lazoff  &  Sherman,  1994).  For  example,  an  elongated  rectangle  makes 
possible  greater  inter-element  distances  than  a  square  of  the  same  area. 

Because  NN  distances  are  restricted  (by  definition)  to  elements  that  are  in  close  proximity, 
their  distribution  is  not  sensitive  to  the  overall  shape  of  an  enclosing  boundary.  However, 
corrections  need  to  be  made  to  allow  for  the  effects  of  proximity  to  the  boundary.  As  a  result 
of  the  combined  effects  of  boundary  shape  and  perimeter  on  the  distinguishability  of  the 
distributions  of  NN  and  of  all  inter-element  distances,  some  effect  of  the  shape  of  the 
enclosing  boundary  is  predicted  on  perceptual  performance. 
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A.3.2  The  importance  of  symmetry  and  transformations  for  visual  perception. 


Symmetry. 

Symmetry  is  defined  as  invariance  under  transformation,  and  most  forms  of  structure  can  be 
regarded  as  some  form  of  symmetry.  Because  the  GT  approach  conceives  the  visual  system  as 
actively  searching  for  symmetries  (including  partial  and  probabilistic  mappings),  it  is  likely,  a 
priori,  to  have  quite  general  applicability  to  visual  perception. 

As  mentioned  earlier,  geometric  approaches  to  visual  perception  focus  on  invariants  (i.e., 
symmetries)  in  reflected  images  that  are  determined  by  relations  between  a  light  source,  an 
object  or  scene  and  an  observer.  Such  approaches  have  produced  an  extensive  literature,  that 
is  typified  by  the  so-called  ecological  approach  to  visual  perception  pioneered  by  Gibson 
(1966;  1979)  and  is  well  represented  by  recent  studies  by  Wagemans,  Van  Gool,  Lamote,  & 
Foster  (2000)  and  Cutting  &  Readinger  (2002). 

A  distinguishing  feature  of  the  GT  approach  is  that  it  permits  a  distinction  between  the  above 
kind  of  optically  determined  structure  and  what  might  be  termed  intrinsic  structure,  that  is 
due  to  the  operation  of  non-optical  constraints,  such  as  forces  of  attraction  and  repulsion, 
between  the  elements  of  an  array.  Most  geometrical  approaches  to  visual  perception  are 
concerned  with  invariants  associated  with  optically  determined  relationships.  However,  the 
process  of  perceptual  organisation  is  concerned  primarily  with  the  detection  of  intrinsic 
structure,  as  illustrated  by  the  examples,  cited  in  the  previous  section,  of  clustering  and 
regularity.  Glass  patterns,  symmetry,  and  periodic  patterns.  An  advantage  of  the  GT  approach 
outlined  above  is  that  it  permits  an  objective  (and  ecologically  grounded)  characterisation  of 
this  kind  of  structure. 


Transformations. 

The  central  role  of  transformational  processes  in  visual  perception  and  cognition  is  also  well 
established  (e.g.,  Dodwell,  1983;  Hahn,  Chater,  &  Richardson,  2003;  Hoffman,  1984;  Leyton, 
1992;  Palmer,  1991;  1999;  Shepard,  1994).  However,  previous  transformational  approaches 
have  consisted  of  representational  hypotheses  rather  than  models  of  the  perceptual  processes. 
That  is,  they  have  been  restricted  to  correlating  the  perceived  organisation  in  an  image  with 
some  objective  measure  of  the  geometrical  characteristics  of  the  image.  In  contrast,  the  GT 
approach  is  implemented  in  a  computer  model  that  also  attempts  to  specify  the  process  by 
which  the  relevant  image  characteristics  are  selected  and  extracted  by  the  visual  system  as 
well  as  the  process  by  which  these  characteristics  select  the  transformations  that  maximise 
generalised  self-similarity  while  minimising  the  difference  between  the  generated 
representation  and  the  current  image  input. 

Vickers  (2002)  has  summarised  the  properties  of  an  earlier  version  of  this  GT  model.  The 
model  is  effective  in  identifying  transformational  structure  in  multi-element  arrays,  such  as 
Glass  and  Marroquin  patterns,  irrespective  of  the  particular  transformation  used  to  generate 
them.  It  is  highly  effective  in  identifying  most  -  perhaps  all  -  kinds  of  regularity  in  such  arrays. 
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even  when  this  structure  is  partially  obscured  by  random  elements  in  addition  to,  or  instead 
of,  the  structured  elements.  It  is  qualitatively  consistent  with  the  general  features  of 
performance  in  the  perception  of  visual  structure,  namely:  scale  invariance,  sensitivity  to 
element  density,  resistance  to  noise,  sensitivity  to  boundary  shape,  and  insensitivity  to  the 
order  in  which  an  array  is  scanned.  The  GT  model  can  also  be  extended  to  account  for  the 
perception  of  fractal  structure  and  the  perception  of  shape  and  orientation  in  depth. 

In  addition  to  its  account  of  perceptual  processes,  the  GT  approach  may  have  some  relevance 
for  accounts  of  the  relationship  between  perception  and  action.  As  mentioned  earlier,  the 
hypothesised  transformational  processes  are  assumed  to  have  evolved  from  internal 
directions  for  guiding  overt  manipulative  actions.  In  line  with  this,  some  researchers  have 
suggested  that  the  mental  rotation  of  stimuli  in  experiments  by  Shepard  and  others  (e.g., 
Shepard  &  Cooper,  1982)  is  closely  associated  with  pre-motor  processes  (e.g.,  Kosslyn,  1994b). 
For  example,  Wohlschlager  and  Wohlschlager  (1998)  found  that  the  making  of  rotational  hand 
movements  interfered  with  simultaneously  executed  mental  object  rotation,  but  only  if  the 
axes  of  rotation  coincided.  On  this  basis,  Wohlschlager  (2001)  suggested  that  mental  rotation  is 
an  imagined  (covert)  action,  rather  than  a  pure  visual-spatial  imagery  task.  Consistent  with 
this  view,  these  authors  showed  that  the  mere  planning  of  hand  movements  interfered  with 
mental  object  rotation. 


A.3.3  Relation  of  GT  approach  to  established  theoretical  approaches  to  visual 
perception. 

The  GT  approach  focusses  on  invariant  relations  between  stimulus  elements  that  are 
diagnostic  of  non-optical  constraints,  and  so  is  complementary  to  geometrical  approaches  that 
focus  on  projective  invariants.  That  is,  the  GT  approach  assumed  that,  like  optical  structure, 
the  visual  perception  of  intrinsic  structure  has  a  strong  ecological  basis. 

Although  a  useful  heuristic,  the  optimisation  assumption  underlying  cognitive  approaches 
faces  difficulties  in  relating  the  optimised  quantity  (e.g.,  simplicity  or  likelihood)  to 
measurable  characteristics  of  visual  stimuli,  and  realistic  optimisation  tasks  present  severe 
computational  difficulties.  The  GT  approach  provides  an  objective  procedure  for  stimulus 
analysis,  while  locally-focussed  processes  that  use  NN  information  depend  linearly  on  the 
number  of  dots,  and  allow  the  system  to  produce  near-optimal  (least-distance)  solutions  at  a 
feasible  computational  cost. 

As  noted,  the  NNA  approach  is  difficult  to  reconcile  with  neurophysiological  approaches 
based  on  CRF  architectures.  However,  the  relational  perspective  of  NNA  may  provide  a 
fruitful  conceptual  framework,  consistent  with  recent  neurophysiological  work,  which  is 
based  on  evidence  for  much  wider  interconnection  among  the  elements  of  a  visual  field  than 
previously  assumed,  and  which  emphasises  receptor  field  plasticity,  context  effects  and 
normalisation  by  element  density. 
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